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1 ■  INTRODUCTION 

In  this  project  our  thesis  has  been  that  Command,  Control  and 
3 

Communications  (C  )  problems  and  other  large  scale  decentralized  decision  and 
control  problems  which  are  traditionally  modeled  as  team  problems  can  be 
investigated  much  more  effectively  as  multiple-goal  multiple  decision  maker 
problems,  especially  under  uncertainty,  and  our  goal  has  been  to  develop  a 
framework  which  would  permit  us  to  perform  sensitivity  analysis  on  team-optimal 
and  leader-follower  policies  in  such  systems.  Our  general  model  has  involved 
hierarchies  in  decision  making,  informational  decentralization,  uncertainty 
in  the  available  information,  constraints  on  information  transmission-capabilities 
between  different  levels  of  hierarchies,  and  possible  discrepancies  between  the 
perceptions  of  different  decision  makers  of  the  common  goal. 

We  have  first  noted  that  in  order  to  model  large  scale  decentralized 
decision  and  control  problems  as  team  problems,  the  following  three  conditions 
should  be  satisfied: 

(i)  All  decision  makers  have  exactly  the  same  perception  of  an  existing 
common  goal,  and  quantify  this  perception  in  exactly  the  same  way; 

(ii)  All  decision  makers  have  access  to  a  common  (probabilistic) 
description  of  the  uncertainties  inherent  in  the  decision  environment; 

(iii)  All  decision  makers  adopt  (or  have  access  to)  exactly  the  same 
(mathematical)  model  of  the  underlying  system  that  characterizes  the  decision 
environment  and  possible  paths  of  evolution  of  the  decision  processes,  and  have 
access  to  the  same  relevant  information  with  regard  to  the  interactions  among, 
and  capabilities  of,  different  decision  makers. 
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Any  existing  discrepancy  between  the  perceptions  of  the  decision 
makers  with  regard  to  any  one  of  the  foregoing  three  conditions  leads  to  a 
decision  problem  which  can  no  longer  be  treated  as  a  team  problem.  Optimal 
decision  rules  derived  by  totally  ignoring  (or  overlooking)  this  aspect  of  the 
problem  in  general  lead  to  outcomes  which  are  extremely  sensitive  even  to  small 
variations  in  the  perceptions  of  the  decision  makers  from  the  common  nominal  model. 

Motivated  by  these  considerations,  we  have  undertaken,  within  the  scope 
of  this  project,  the  tasks  of  (i)  developing  a  methodology  for  performing  a 
sensitivity  analysis  on  deterministic  and  stochastic  team  and  multi-person  decision 
problems  in  the  face  of  deviations  (of  the  perceptions)  from  a  common  nominal 
model,  (ii)  using  this  methodology,  to  obtain  minimally  sensitive  decision  rules 
for  different  classes  of  deterministic  and  stochastic  models,  and  (iii)  investigat¬ 
ing  the  role  of  information  and  hierarchy  in  the  derivation  of  such  policies. 

We  believe  that  in  the  scope  of  this  project  significant  contributions 
have  been  made  to  the  state  of  knowledge  on  these  tasks,  as  documented  in  the 
references  attached  to  the  report.  In  the  next  section  we  provide  brief 
descriptions  of  our  research  accomplishments  in  these  and  related  areas,  keyed 
to  the  reference  list  which  constitutes  Section  3  of  this  report. 


2.  RESEARCH  PROGRESS 


In  this  section,  we  briefly  outline  some  of  the  new  results  obtained 
in  the  scope  of  this  project,  in  five  categories;  full  details  can  be  found  in 
the  references  given,  which  are  attached  (in  full)  to  this  report. 

Category  A:  In  the  first  three  references  listed  in  Category  A,  we  have  studied 
the  sensitivity  of  leader-follower  policies  in  two-agent  deterministic  decision 
problems  to  variations  in  the  values  of  some  parameters  describing  the  objective 
functionals,  for  both  Stackelberg  and  team  problems.  In  [Al]  we  introduce  an 
appropriate  sensitivity  function  and  introduce  the  notion  of  a  "robust"  incentive 
scheme  for  the  leader  as  one  that  minimizes,  in  addition  to  the  usual  (standard) 
Stackelberg  performance  index,  this  sensitivity  function.  Such  an  approach  has 
applications  in  decision  problems  wherein  the  leader  does  not  know  the  exact 
values  of  some  parameters  characterizing  the  follower's  cost  functional,  and 
seeks  to  "robustify"  his  optimum  policy  in  the  presence  of  deviations  from  the  nom¬ 
inal  values  .  In  [Al]  we  also  provide  an  in-depth  analysis  of  such  incentive  design 
problems,  obtain  some  explicit  results  for  general  convex  cost  functionals,  and 
present  some  illustrative  examples.  In  [A2]  and  [A3],  on  the  other  hand,  we 
study  a  general  class  of  "nominal"  team  problems  with  two  agents  and  with  a 
hierarchical  decision  structure,  where  we  also  allow  one  of  the  decision  makers 
to  have  a  slightly  different  perception  of  the  overall  team  goal,  with  this 
slight  variation  not  known  by  the  other  agent  who  is  assumed  to  occupy  the 
hierarchically  dominant  position.  This  leading  agent  is  assumed  to  have  access 
to  dynamic  information,  and  his  role  is  to  announce  a  robust  policy  (incentive 
scheme)  which  would  lead  to  achievement  of  the  overall  team  goal  in  spite  of 
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the  slight  variations  in  the  other  agent's  perception  of  that  goal.  In  the  paper 
we  obtain  such  robust  policies  for  the  leading  agent,  for  general  cost  functionals 
with  convex  structure,  and  also  show  that  in  some  special  cases  this  robust 
feature  of  the  incentive  scheme  is  maintained  regardless  of  the  magnitude  and 
nature  of  the  variations. 

In  [A5]  and  [A6]  we  study  a  version  of  the  problem  of  [Al]  in  a 
stochastic  context,  which  is  a  challenging  dynamic  optimization  problem.  More 
specifically,  we  consider  a  class  of  stochastic  incentive  decision  problems  in 
which  the  leader  has  access  to  the  control  value  of  the  follower  and  to  private 
as  well  as  common  information  on  the  unknown  state  of  nature.  The  follower's 
cost  function  depends  on  a  finite  number  of  parameters  whose  values  are  not 
known  accurately  by  the  leader,  and  in  spite  of  this  parametric  uncertainty  the 
leader  seeks  a  policy  which  would  induce  the  desired  behavior  (in  a  stochastic 
equi valance  sense)  on  the  follower.  In  the  paper,  we  obtain  such  appealing 
policies  for  the  leader,  which  are  smooth,  induce  the  desired  behavior  at  the 
nominal  values  of  these  parameters,  and  furthermore  make  the  follower's  optimal 
reaction  either  minimally  sensitive  or  totally  insensitive  to  variations  in  the 
values  of  these  parameters  from  the  nominals .  The  general  solution  is  determined 
by  some  orthogonality  relations  in  some  appropriately  constructed  (probability) 
measure  spaces  and  leads  to  particularly  simple  incentive  policies  which  have 
no  counterparts  in  deterministic  problems. 

In  [A6]  we  address  a  class  of  nominally  stochastic  team  problems  and 
explore  the  impact  of  the  additional  degree  of  freedom  brought  in  by  the  team 
nature  of  the  problem  on  the  sensitivity  properties  of  different  team  solutions. 
Finally,  in  [A7],  which  is  currently  under  preparation,  we  provide  a  general 
survey  of  these  results  with  possible  extensions  and  applications. 
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Category  3:  In  the  first  two  papers  listed  in  Category  B,  we  study  a  class  of 
decision  problems  in  which  this  time  the  probabilistic  description  of  the 
stochastic  variables  is  perceived  differently  by  different  agents.  We  first 
show  that  when  the  decision  makers  have  different  probabilistic  models  of  the 
stochastic  environment,  the  resulting  decision  problem  is  a  nonzero-sum  (multi¬ 
criteria)  stochastic  game,  even  if  the  decision  makers  have  a  single  common 
goal  quantified  in  exactly  the  same  way  (say  by  a  cost  functional).  Hence, 
even  in  team  problems  the  corresponding  solution  concept  (team-optimal  solution) 
will  have  to  be  modified  when  discrepancies  exist  in  the  perception  of  the  agents 
of  the  probabilistic  model  of  the  decision  process.  The  currently  available 
theory  of  nonzero-sum  stochastic  games  was  not  applicable  to  such  problems ,  and 
a  brand  new  theory  had  to  be  developed.  This  is  what  we  have  accomplished  in 
these  papers,  (cf.  [Bl],  [B2]  and  [B4]),  for  two  agent  problems  with  static 
information  patterns.  We  introduce  the  concept  of  "stable  equilibrium  solutions" 
for  decision  problems  with  multiple  probabilistic  models,  and  obtain  sufficient 
conditions  for  existence  and  uniqueness  of  such  equilibria  (under  a  symmetric  mode 
of  decision  making)  when  the  objective  functionals  are  quadratic  and  the  decision 
spaces  are  general  inner-product  spaces.  Furthermore,  for  the  special  case  of 
Gaussian  distributions  in  both  discrete  and  continuous-time  problems,  we  present 
in  [B2]  and  [ B A ]  some  explicit  stable  equilibrium  policies. 

*  While  [Bl]  and  [B2]  treat  stochastic  multimodeling  under  a  symmetric 

mode  of  decision  making,  [B3]  and  part  of  [B4]  deal  with  the  case  of  asymmetric 
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mode  of  decision  making.  Here  again,  even  in  team  problems,  a  discrepancy  in 

j  the  perceptions  of  the  decision  makers  of  the  underlying  probabilistic  model 
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leads  co  a  stochastic  nonzero-sum  game,  which  is  of  the  type  not  treated  before 
in  the  literature.  We  develop  a  general  equilibrium  theory  for  such  problems 
in  [B3]  and  [B4],  and  analyze  the  special  case  of  Gaussian  distributions  in 
greater  depth.  One  of  the  important  findings  of  this  analysis  is  that  while  the 
equilibrium  solution  for  the  symmetric  case  is  linear,  this  is  no  longer 
true  when  the  mode  of  decision  making  is  asymmetric.  In  this  latter  case,  the 
unique  equilibrium  solution  is  nonlinear  even  under  Gaussian  distritutions ,  when 
discrepancies  exist.  In  the  limiting  case  as  discrepancies  disappear,  this 
nonlinear  solution  degenerates  into  a  linear  one  (and  so  does  the  best  linear  one) , 
thus  displaying  the  existence  of  a  bifurcation  phenomenon.  Reference  [B5], 
which  is  currently  under  preparation,  extends  these  results  to  decision  problems 
with  a  larger  number  of  agents  and  different  types  of  hierarchies. 

CxzP.gcvy  C:  In  paper  [Cl],  we  continue  our  earlier  work  on  Stackelberg  dynamic 
games  and  consider  a  subclass  of  such  problems  in  which  the  leader  has 
informational  advantage  over  the  follower,  in  the  sense  that  the  leader  can 
observe  the  follower's  actions  at  each  stage  (before  he  (the  leader)  acts)  either 
perfectly  or  partially.  Under  a  feedback  Stackelberg  solution  concept  which  takes 
this  informational  advantage  into  account,  we  have  studied  derivation  of  optimal 
affine  policies  and  have  investigated  the  conditions  under  which  such  a  solution 
coincides  with  the  global  Stackelberg  solution.  A  second  set  of  results  obtained 
in  [Cl]  involves  an  analysis  of  existence  and  derivation  of  causal  real-time 
implementable  global  Stackelberg  solution  in  dynamic  games  when  the  leader  is 
allowed  to  use  memory  policies. 
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''tajcj’u  D:  In  the  two  papers  listed  in  Category  D,  we  have  addressed  the 
important  problem  of  developing  a  general  equilibrium  theory  for  discrete 
and  continuous-time  dynamic  games  with  varying  (symmetrical  and  asymmetrical) 
modes  of  play,  i.e.  for  games  in  which  the  solution  concept  itself  and  the 
leadership  is  determined  by  past  actions  of  the  players  and  the  outcome  of  some 
(stochastic)  process.  In  [Dl]  we  study  stochastic  systems  with  structural  and 
modal  uncertainties  described  by  a  finite  state  jump  process,  and  introduce  a 
new  concept  of  equilibrium  (which  we  call  "strong  equilibrium")  which  "  '.ompasses 
both  the  feedback  Nash  and  feedback  Stackelberg  solution  concepts  fo  he  special 
cases  of  deterministic  discrete-time  games  with  symmetrical  and  asv  rrical 
modes  of  play,  respectively.  This  new  equilibrium  concept  also  prov*_c:S  a 
convenient  framework  for  the  introduction  of  a  feedback  Stackelberg  solution 
concept  in  deterministic  differential  games.  For  the  general  class  of  stochastic 
nonzero-sum  dynamic  games  with  structural  and  modal  uncertainties,  and  under  the 
feedback  closed-loop  information,  we  obtain  the  optimality  conditions  in  both 
discrete  and  continuous  time.  We  also  study  certain  special  cases,  which  are 
further  discussed  in  [D2]  along  with  some  illustrative  examples. 

Category  z:  In  paper  [El]  we  extend  the  currently  available  theory  of  dynamic 
games  in  a  new  direction,  so  as  to  encompass  games  with  state  equations  of  order 
higher  than  one.  We  first  show  that  the  standard  state  augmentation  technique 
is  not  applicable  in  a  game  context,  and  therefore  a  new  technique  has  to  be 
developed  which  is  tailored  to  the  underlying  information  pattern.  In  the  paper 
we  develop  such  a  technique  whereby  we  obtain  informationally  unique  Nash 
equilibrium  solution  to  a  class  of  dynamic  games  of  order  higher  than  one,  and 
with  random  disturbances  in  the  state  equation. 
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In  the  invited  paper  [E2]  ,  we  first  provide  a  brief  review  of  some 
the  recent  results  on  dynamic  games,  in  particular  with  regard  to  memorv  strati- 
and  then  discuss  potential  applications  of  the  techniques  developed  in  tills 
context  to  large  scale  systems  design,  optimization  and  coordination,  We  propo 
a  number  of  design  criteria  for  coordinator  policies  in  interconnected  systems, 
and  provide  recipes  to  obtain  such  policies  with  good  sensitivity  properties. 
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In  this  paper  we  introduce  the  notion  of  robust  incentive  schemes  in  multi-apent  decision  problem^  with  a  hierarchical  decision 
structure,  and  discuss  the  derivation  of  such  policies  bv  minimizing,  in  addition  to  the  usual  (standard)  Stackelberg  performance 
indices,  an  appropriate  sensitiviiv  function  Such  an  approach  has  applications  in  decision  problems  wherein  the  leader  does  not 
know  the  exact  values  of  some  parameters  characterizing  the  follower  s  cost  functional,  and  seeks  to  robuslifv  his  optimum  policv 
in  the  presence  of  deviations  from  the  nominal  values  An  in-depth  analysis  of  such  incentive  design  problems  is  provided  and 
optimum  robust  incentive  schemes  are  derived  for  general  cost  functionals  with  a  convex  structure  The  results  are  then  applied  to 
an  incentive  design  problem  ansmg  in  economics,  leading  to  some  meaningful  robust  incentive  policies 

1.  introduction 

Optimum  incentive  design  problems  constitute  a  promising  and  mathematically  challenging  class  of 
decision  problems  in  economics  and  operations  research  (6-8],  and  have  also  recently  attracted  the 
attention  of  control  theorists  [2-5.9-11]  because  of  the  close  relationship  with  Stackelberg  games 
[1.4.  12]  Viewed  as  a  dynamic  Stackelberg  game,  an  optimum  design  problem  involves  a  hierarchy  in 
decision-making  and  an  information  structure  that  allows  the  decision-maker  at  the  top  of  the  hierarchs 
(to  he  called  the  leader)  to  acquire  (perfect  or  partial)  information  on  the  actions  of  the  other 
decision-maker(s)  (the  so-called  follower(s))  This  available  information  enables  the  leader  to  design  a 
policy  (called  an  incentive  scheme)  that  forces  the  follower(s)  to  a  desired  (from  the  leader's  point  of 
view)  behavior.  The  utmost  goal  sought  by  the  leader  may  be  described  in  the  form  of  maximization  of 
a  utility  function  (or  minimization  of  a  cost  function),  or  it  may  be  some  set  of  Utopic  points  in  an 
appropriate  space,  determined  according  to  some  criterion.  The  incentive  scheme  may  be  in  the  form  of 
a  threat  strategy  [9]  or  some  smooth  policy  with  appealing  regularity  conditions,  as  discussed  in  [5).  This 
latter  reference,  in  particular,  shows  that  in  the  case  of  two  decision-maker  problems,  and  when  the 
follower's  cost  function  is  strictly  convex,  there  exists  an  optimal  affine  strategy  for  the  leader,  which  is 
affine  in  the  dynamic  information.  This,  however,  does  not  imply  that  the  solution  will  be  unique,  in 
fact,  there  will  exist,  in  general,  a  multitude  of  optimal  incentive  schemes. 

Existence  of  multiple  solutions  to  incentive  design  problems  prompts  a  further  design  criterion 
according  to  which  a  further  selection  can  be  made.  In  this  paper  we  introduce  such  a  selection  criterion 
under  which  robust  incentive  policies  can  be  obtained  for  two-agent  deterministic  problems  with  strictly 

•  This  work  was  supported  in  part  by  the  Office  of  Naval  Research  under  Contract  N(MHU4-82-K-0469.  in  part  b\  the  Joint  Serv  ices 
Electronics  Program  under  Contract  N1KKI14-79-C-0424.  and  in  part  by  the  Electric  Energv  Systems  Division.  Department  of  Energs. 
under  Contract  DE-AC01-81RA-506S8  with  Dynamic  Systems,  P.O  Box  423.  Urbana.  II.  61801  A  preliminary  version  of  this  paper 
was  presented  at  the  21st  IEEE  Conference  on  Decision  and  Control.  Orlando,  FL  December  1082 
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convex  cost  functionals  More  specifically,  we  assume  that  the  leader  does  not  know  the  exact  value  of  a 
parameter  that  characterizes  the  follower's  cost  function,  and  thereby  his  optimal  response  function.  He 
may  take  a  nominal  value  for  this  unknown  parameter  (assuming  that  such  a  candidate  exists)  and  seek 
an  optimal  incentive  scheme  that  is  not  only  optimal  for  that  chosen  value  but  is  also  least  sensitive  to 
deviations  from  the  nominal  We  call  such  incentive  policies  robust,  and  develop  in  this  paper  a  new 
approach  for  the  derivation  of  such  policies  by  introducing  an  appropriate  sensitivity  function 
The  problem  is  formulated  in  precise  mathematical  terms  in  Section  2.  Section  3  introduces  sensitivity 
functions  relevant  to  the  context  and  derives  robust  affine  policies  for  a  general  class  of  problems  with 
convex  cost  functionals.  A  special  class  of  problems  with  separable  and  singular  cost  functions  are 
treated  in  Section  4,  leading  to  some  explicit,  analytic  solutions.  Section  5  deals  with  an  extension  of 
these  results  to  a  larger  class  of  incentive  schemes  which  also  include  nonlinear  policies.  In  Section  6. 
the  results  of  Sections  3  and  4  are  applied  to  a  problem  arising  in  microeconomics,  and  it  is  shown  that 
the  unique  robust  affine  incentive  policy  for  the  leader  bears  a  very  meaningful  economic  interpretation. 
The  concluding  remarks  of  Section  7  end  the  paper. 


2.  Problem  formulation 

Consider  a  two-person  deterministic  dynamic  game  in  normal  form,  described  by  the  cost  functionals 
7i(y,.  y;)  and  Jz(yi.  y a)  for  player  1  (the  leader)  and  player  2  (the  follower),  respectively.  Here,  the 
strategies  y,  and  yi  belong  to  a  priori  specified  strategy  spaces  T ,  and  T respectively,  and  a  £  -.  CR 
denotes  a  parameter  on  which  the  follower's  cost  functional  depends.  Let  us  denote  the  decision 
variables  of  the  leader  and  the  follower  by  u  £  U  =R"  and  v  £  VsR*.  respectively.  In  this  paper,  we 
will  assume  that  the  follower  has  open-loop  information,  and  hence  take  T;=  V.  By  an  abuse  of 
notation,  we  also  let  /,(«.  v)  and  7:(u,  v.  a)  denote  the  cost  functionals  over  the  product  space  V  *  V. 
for  each  a  e  \.  Let  us  further  assume  that: 

(i)  J;(u,  v,  a)  is  strictly  convex1  and  twice  continuously  differentiable  on  U  x  V.  with  a  £  V. 

(ii)  the  leader  has  perfect  access  to  the  follower's  action  v:  and 

(iii)  the  leader  is  uncertain  about  the  actual  value  of  a  £  \  :  however,  he  designs  his  strategy 
according  to  a  nominal  value  of  a.  say  a*  €  C.  keeping  in  mind  that  a*  may  not  be  the  actual  value. 

Under  this  setup,  the  problem  faced  by  the  leader  is  twofold: 

(1)  To  design  a  Stackelberg  strategy  yT  E  l\  which,  by  also  taking  into  account  rational  reactions  of 
the  follower,  leads  to  a  desired  value  of  J\,  which  may  be  its  global  minimum  over  U  x  V:  let  us  denote 
such  a  value  by  J *.  More  precisely,  one  of  the  objectives  of  the  leader  is  to  find  a  y  *  £  I\  such  that 

J\(y*.v)=J*.  for  all  v  E  K„-(yt).  (la) 

where  Ra(y ,)  denotes  the  optimal  reaction  set  of  the  follower  and  is  defined  for  each  fixed  »E!.  by 

/?„(ri)  =  {c*  £  v:y:(y„  u*.  a)  «  /:(yi.  v.  a ),  Vv  £  V'} .  (lb) 

Under  an  additional  technical  restriction  which  will  be  delineated  in  the  sequel,  it  has  been  shown  in 
[5]  that,  for  each  fixed  a  £  \.  there  exists  an  affine  strategy  for  the  leader  which  leads  to  J).  More 
precisely,  with  a  fixed  at  a  nominal  value  a*,  there  exists  an  nxm  matrix  Q(«*).  satisfying  the 
relation.^ 


’  This  restriction  will  be  relaxed  later  in  Section  4. 

:  All  partial  derivatives  of  a  scalar  with  respect  to  a  vector  are  taken  to  be  row  vectors,  whereas  all  other  vectors  are  column 
vectors -a  convention  that  we  adopt  throughout  this  paper. 
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so  that  the  incentive  strategy. 

«  =  >i(f)  =  -  0(o*)(f  -  f')  (?) 

forces  the  follower  to  t  -  t\  provided  that  dJ:(u.  v,  a  )/du  evaluated  at  ( u  =  u'.  i  =  r')  does  not  vanish 
within  an  r -neighborhood  of  a*,  where  (id.  v')  minimizes  J,(u.  r)  over  U  *  V. 

In  general,  equation  (2)  defines  a  class  J  of  (n  x  m )  matrices  which  force  the  follower  to  r  =  t '  We 
also  note  that  there  exist  other  strategies  for  the  leader,  which  attain  J*.  and  they  need  not  be  of  the 
affine  form  as  in  (3). 

(2)  Since  the  leader  does  not  have  perfect  knowledge  of  a.  the  actual  value  may  be  different  from  a  ’ 
in  which  case  the  leader  will  end  up  with  an  inferior  performance,  because  y*  so  defined  is  optimal  only 
at  a  =  a’.  Therefore,  it  is  highly  desirable  for  the  leader  to  have  a  robustness  property  associated  with 
his  optimal  strategy  .  More  precisely,  the  leader  would  like  to  have  the  sensitivity  of  the  realized  value  of 
his  cost  functional,  against  the  variations  in  a  about  its  nominal  value  a*,  be  as  small  as  possible  This 
propertv  mav  be  induced  by  making  use  of  the  intrinsic  nonuniqueness  of  the  solutions  of  (2).  and  also 
by  introducing  nonlinear  strategies  which  satisfy  (la),  as  will  be  shown  in  the  sections  to  follow. 


3.  Introduction  of  a  sensitivity  function  and  derivation  of  robust  solutions 

Let  y,  €  T,  he  an  incentive  strategy  for  the  leader,  and  i„  €  R„  (y,).  Towards  the  goal  set  in  Section  2. 
and  as  a  measure  of  the  sensitivity  of  J\(y\.  r„)  with  respect  to  deviations  of  a  from  its  nominal  value 
a*,  let  us  introduce  the  total  derivative  of  ./ify,.  t0).  with  respect  to  a.  evaluated  at  a  =  o',  it  =  u'  and 
v  =  v‘\  more  precisely,  let  us  confine  ourselves  to  affine  strategies  of  the  form  (3)  and.  by  abuse  of 
terminology,  let  us  define  the  first-order  sensitivity  function  of ./ t„ )  with  respect  to  a.  and  at  a  =  a  * 
as 


u  -  u‘ 


(4) 


However,  since  the  pair  (u‘.  v')  globally  minimizes  J,  on  the  product  space  V  *•  V.  h(o')  vanishes. 
Then,  the  next  term  in  the  Taylor  expansion  of  J,(y ,.  r0)  around  a*,  which  we  call  the  second-order 
sensitivity  function  of  J i(Ti.  £•'„)  with  respect  to  o,  will  have  to  be  considered.  Denoting  it  by  f:(o*).  and 
suppressing  the  arguments,  we  have: 


7^.)*^,^.).  it.) 


q'  JUIl  +  JUL.  q  +  dj_\ 

\da  /  8u:  ^  U  dvdu  dud v  W  dv2  Ada  /  «.„• 


I  a  ’  a 
um  u1 

r*f* 


d-Ji  ,  dO, 


i  \  /  dr 


(5) 


o  =  a 
u  =  u' 

i-.i' 


In  order  to  find  an  expression  for  dt/da.  we  note  that  the  equation. 

=  0. 


J. 


(fi) 
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is  an  identity  for  all  aS1.,  and  it  completely  specifies  the  optimal  response  of  the  follower  when 
affine  strategies  y,  of  the  form  (3)  are  used.  The  derivative  of  the  above  expression  with  respect  to  a 
would  still  vanish  for  all  a  £  *  We  then  have,  with  Q  =  Q(a’): 


°^0 

du- 


O' 


<9-7,  d "7; 


<9i<9  u  dude 


o 


<97, 

dv 


*!>£• 


<9-7; 

dadu 


0'  T 


<9-7, 


dadv 


0.  Va  e 


(7) 


Since  J2(u.  v,  o)  is  strictly  convex  in  u  and  v  for  all  a  6  K.  the  coefficient  matrix  of  (dvlda)  is  positive 
definite,  and  thereby  invertible  '  Combining  (5)  and  (7)  we  obtain  the  second-order  sensitivity  function 
of  J ,(yi.  t'o)  with  respect  to  a.  at  a  =  a*.  to  be: 


<97,  <9-7- 


O  +  O' 


dvdu 

<97, 

dvdu 


'  du~  dvdu 


dudv 

h  <97, 
dudv 

<97; 

dudv 


o 
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dv 

<97, 

<9l'“ 

<97; 

dv2 


<97; 

'  dadu 


<97 


dadv 


dv  1 


\  '(  <97;  _ 

<97;  ,| 

)  \dadu® 

dadu  ) 

(*) 


Remark  1.  When  the  leader  enforces  his  team  solution,  his  objective  will  be  to  minimize  /;(<*’)  over  J. 
since  /,(<**)  vanishes.  On  the  other  hand,  there  may  be  cases  when  he  would  prefer  to  enforce  a  point 
other  than  ( u‘ ,  v').  in  which  case  Ii(a*)  will  not  necessarily  be  zero.  However,  when  the  nonvanishing 
/,(am)  does  not  depend  on  the  choice  of  Q  from  the  class  7  of  n  x  m  matrices  which,  together  with  an 
affine  strategy  of  the  form  (2),  enforce  the  follower  to  the  desired  point,  one  still  has  to  consider  /;(<**) 
as  a  measure  of  obtaining  minimum  sensitive  solutions.  This  point  will  be  further  elucidated  in  Section 
6.  □ 

The  problem  now  is  to  minimize  /;(<**)  over  all  (n  <  m)  real  matrices  Q(a")  subject  to  the  constraint: 


du  dv 


=  0. 


(9) 


We  first  observe  that  /:(a*)?().  and  hence  a  lower  bound  for  /;(a*)  is  zero.  We  will  now  show  that 
this  lower  bound  is  tight  for  a  fairly  large  class  of  problems.  Towards  this  end.  let  us  note  that  this  lower 
bound  is  reached  when 


dOl  ,  <97; 

dadu  °  ^  dadv 


(10) 


subject  to  (9).  These  two  equations  may  be  combined  and  written  as: 


/  <977  dadu 
\  dJildu 


)CH«*)+( 


d2J2ldadv 
dJll  dv 


) 


=  0. 

0=0* 

u*ur 


(I!) 


'  For  inveriihility  if  is.  of  course,  sufficient  that  the  Hessian  matrix  of  ./•  be  full  rank 
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Let  q„  i  =  1.2 . m.  denote  the  ith  row  of  Q(a*).  Then,  (11)  represents  a  collection  of  m  sets  of 

linear  algebraic  equations,  with  each  set  consisting  of  two  equations  with  n  unknowns,  which  cor¬ 
respond  to  entries  of  q,  Thus,  in  order  to  have  at  least  one  solution  to  (11),  it  is  sufficient  that: 


d2Jzldac>u 

dJzidu 


(12) 


For  this  requirement  to  be  met,  it  is,  of  course,  necessary  that  dim(«)3:  2.  Suppose  (12)  is  satisfied,  and 
let  Q°(a*)  denote  a  solution  to  (ll).4  Then,  an  affine  strategy  given  by  (3),  with  Q(a’)  =  Q°(a*).  makes 
the  term  du/da  defined  by  (7)  vanish  at  the  nominal  solution  point  By  this  token,  the  sensitivity 
function  of  order  3,  i.e. 

v°) 

da3  «-«•' 

u-u' 

l’*t 

is  annihilated  at  the  nominal  solution  point,  since  it  carries  (by  the  chain  rule)  the  product  term  du/da.  In 
other  words,  the  third-order  Taylor  approximation  of  the  effect  of  a  perturbation  in  the  value  of  a  on 
J i(>i(t;„).  v„)  is  zero  within  an  e -neighborhood  of  the  nominal  solution  point.  Therefore,  the  affine  strategy. 

y?(tt)=u'  +  Q0(a*)(t’-t').  (13) 


where  Qu(a*)  satisfies  (11),  has  very  appealing  sensitivity  properties. 

In  the  preceding  discussion,  we  confined  ourselves  to  a  scalar  parameter  a.  We  now  relax  this 
condition,  by  letting  a  e  R'.  In  this  case  du/d  a  becomes  an  (r  x  m)  matrix.  If  all  the  entries  of  du/da 
vanish  at  the  nominal  solution  point,  so  will  the  entries  of  the  sensitivity  functions 

„  =  i  -»  3 

da'  P  '' 


This  is  the  case  if  Q(a*)  satisfies  (11)  with  d^J^dadu  and  dzJ2ldadv  being  (r  x  n)  and  (r  x  m)  matrices, 
respectively.  A  sufficient  condition  for  this  (replacing  (12))  is 


d2JiJdadu 

dJ-Jdu 


) 


r+  1  . 


(14) 


We  now  summarize  these  results  in  the  following  proposition. 

Proposition  1.  Let  (14)  be  satisfied.  Then  (i)  there  exists  at  least  one  O(a’)  ( denoted  C?u(a*))  satisfying 
(11).  and  (ii)  the  affine  strategy. 

y,(u)  =  m'  +  C?°(a*)(u  -  e') , 

induces  the  follower  to  play  v  -  v'  when  a  =  a*,  and  makes  the  first-,  second-  and  third-order  derivatives 
of  J\(yi(va).  u’« )  with  respect  to  the  r- vector  a  vanish  at  the  solution  point.  □ 


4  When  (11)  admits  more  than  one  solution,  each  of  them  fulfills  the  robusiness  cnterion  set  above  In  this  case,  one  can  pursue 
further  analysis  to  minimize  higher  order  sensitivity  functionals  within  the  class  of  these  robust  solutions 
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4.  Singular  incentive  problems 

in  the  previous  section  the  least  sensitive  incentive  design  problem  was  solved  by  choosing  Q{a") 
such  that  de/da  is  annihilated  at  the  nominal  solution  point  The  approach  adopted  in  that  section  fails 
to  be  applicable  when  the  cost  functional  of  the  follower  is  separable,  as 


J2(u.  v.  a )  =  gt(u,  v )  +  g:( v.a). 


(15) 


in  which  case  the  term  d1J2Jdadu  vanishes.  A  partial  remedy  for  this  class  of  problems  can  be  worked 
out  by  choosing  the  entries  of  Q(a')  as  large  as  possible  subject  to  (9)  and  some  self-imposed  bounds 
(13).  Among  this  class,  however,  the  so-called  singular  incentive  problems  are  of  particular  interest  On 
the  one  hand,  they  lead  to  analytical  inner  point  solutions  which  are  quite  appealing  (see  below  the 
example  of  Section  6);  on  the  other  hand,  they  provide  a  convenient  setting  for  relaxing  the  assumption 
on  the  strict  convexity  of  J2  on  the  product  space  U  x  V 
Towards  this  goal,  let  us  assume  that  the  leader's  control  affects  the  follower's  cost  functional  linearly. 
By  analogy  with  singular  control,  this  class  of  problems  will  be  referred  to  as  singular  incentive 
problems.  In  a  singular  incentive  problem,  the  follower's  cost  functional  is  not  convex  on  the  product 
space  U  x  V,  and  therefore  the  previous  theory  on  linear  incentive  problems  is  not  directly  applicable; 
however,  an  extension  is  possible,  as  we  elucidate  in  the  sequel.  First  note  that  when  the  leader 
announces  an  affine  strategy  of  the  form  (3),  the  follower's  cost  functional  J2  becomes  a  function  of  onlv 
v,  for  a  given  a  6  V  Accordingly,  if  the  functional  J2(u:  +  Q(a*)(r  -  v  ).  v.  a)  is  strictly  convex  on  V  for 
each  a  £  A,  and  if  equation  (2)  admits  a  nonempty  family  J  of  solutions  at  the  desired  point  ( u' .  v'). 
then  the  least-sensitive  incentive  design  problem  becomes  meaningful  for  this  particular  class  of  cost 
functionals.  More  precisely,  for  a  given  u'  £  U ,  let 


<?(«*)+(?'(«*) 


v.  a) 
dvdu 


(lb) 


for  all  v  £  V,  a  £  A  and  Q(a*)E&  C  ■$,  where  Q.  is  assumed  to  be  nonvoid.  Under  these  assumptions, 
the  optimal  reaction  of  the  follower  to  any  announced  strategy  of  the  form  (3).  where  Q(a')&2.  is 
v  =  v‘.  which  is  confirmed  by  (2)  as  a  first-order  necessary  condition,  and  (lb)  provides  a  second-order 
sufficient  condition  for  the  optimality  of  v  =  v'. 

As  we  indicated  above,  for  this  class  of  incentive  design  problems  it  is  possible  to  obtain  an 
inner-point  solution  analytically  because  of  the  particular  structure  of  the  sensitivity  function.  Towards 
this  end,  let  us  assume  that  n  =  2.  m  =  1.  and  let  and  q2  denote  the  entries  of  O(o*).  Using  (W). 
/:(a*)  becomes: 


d2J> 


■ 

dltjdv 


d'-Ji 

du\du2 


d2J\ 


du 


7<?2' 


d2Jx  1 

r  d2h 

dv2\ 

L  dadv, 

7  92Jl  _ 


d2J2 

du,d  V 


(U) 


where  S  and  £  are  defined  by; 
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The  derivative  of  I2(a’)  with  respect  to  q2  vanishes  at 


<?:  = 


*  flu-,  *  flu, A v 


fl'-J,  £Jx 

flv 


+  *:j‘ 

fl'-J ,  1[*V2 

fl2J  2 

flu2fl  V 

*  flu,flu2\[  flv 2 

5  flu, flv 

fl'-J,  ] 

\fl'-J2  fl'-J2  1 

[  fl'-J, 

+  fluil 

L  <9f:  *  flu,  flv  J 

K-ST 

flv-  J  l  fluiflv  flu,  flv  ]  l  flu  1  flu,  A 

fl'-Jl 


d‘J\ 
dv 


52  iLJj. . 
flu. 


28 


flu,flu2 


flzJ,  fl'J, 


flu  1  flv  flu2flv 


fl'-J  1 
flu,  flu 


-If— 4- 

2  Jl  flu2flv 


fl2J2 

flUiflv 


(19) 


0*0 

uxul 


provided  that  the  denominator  of  (19)  is  not  zero.  We  then  readily  have  the  following  proposition 


Proposition  2.  The  minimum -sensitive  ( robust )  linear  incentive  design  problem  formulated  in  this  section 
admits  a  unique  solution  (q*,.q2).  where  q2  is  given  by  (19)  and  q\  is  obtained  through  the  linear 
constraint  (9)  which  relates  q,  to  q2.  provided  that  the  denominator  of  (19)  does  not  vanish  and 
iq].q':)e2. 


Proof.  This  result  follows  from  the  following  four  properties  of  the  function  F(q2)=  I2(a*): 
(i)  There  exists  a  finite  number  M  such  that: 


lim  F(q2)  =  lim  F(q2)  =  M  . 


(11)  F{q2)-++*  as: 


d2M  /\  Alh 
flV7)/  [flu2flv 


s 

=  <?: 


(iii)  F(q2)  is  continuous,  except  at  q2  -  q2. 

(iv)  F(q2)  has  a  single  stationary  point  q2. 

These  four  properties  readily  lead  to  the  conclusion  that  q *  is  the  unique  minimizing  solution  for  F  □ 


S.  Use  of  nonlinear  strategies  in  sensitivity  considerations 

The  analysis  of  previous  sections  was  confined  to  the  class  of  linear  strategies.  Although  this  class  is 
rich  enough  to  provide  optimal  least  sensitive  solutions  to  incentive  design  problems,  use  of  nonlinear 
strategies  may  provide  additional  degrees  of  freedom,  especially  when  Q(a*)  is  determined  uniquely 
through  (9)  Specifically,  when  dim  u  =  1  the  constraint  (9)  determines  Q(a*)  uniquely,  and  hence  the 
set  d  is  a  singleton  and  it  does  not  allow  sensitivity  considerations.  However,  if  the  leader  is  permitted 
to  enlarge  his  strategy  space  by  including  a  suitable  nonlinear  term  in  his  control,  he  may  have  extra 
degrees  of  freedom  to  reduce  the  sensitivity  of  his  performance  to  changes  in  the  uncertain  parameter  in 
the  follower's  cost  functional.  Towards  this  end,  let  us  assume  that  the  strategy  space  of  the  leader  is  the 
set  of  all  mappings  from  V  onto  V.  and  twice  continuously  differentiable  a!  the  nominal  solution  point. 
For  n  =  m  =  1,  l2(a‘)  becomes: 
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*'  ’  |  \  du:  \  dv  I  dudv  dv  dl2  I 

x  /  <!lh.  foi  i  V /ldZJl  )  •»  ^  dy' 

\duda  dv  dudai  /  \du 2  V  I  ~  dudv  dv 


dj2  d2y , 


5m  dir  du 


d;7j j: 


(2Ua) 


Here,  dyjdv  is  completely  determined  from. 


dJ-jdy ,  |  d7: 
du  dt>  du 

U»Ul 

u*t»* 


(20b) 


The  optimization  of  /2(a*)  may  require  d2yjdv2  to  take  arbitrarily  large  values.  Such  a  strategy  may- 
give  rise  to  arbitrarily  large  values  of  u  for  finite  values  of  v.  Since  the  affordability  and  credibility  of 
such  a  strategy  is  questionable,  it  is  necessary  to  impose  bounds  on  d2yjdv2  such  as 


i!lil 

dv2 


=s k ,  fcGR* 


(21) 


Under  such  a  constraint,  the  minimizing  argument  of  I2(a’)  is  given  by: 

(S)*- 


dv 

where 


^=sgn(^ 


(22) 


sgn(x) = 


+  1. 

x  >0 

0. 

x  =  l) 

-1, 

x  <  0 

(23) 


Here,  all  the  partials  are  evaluated  at  the  nominal  point.  These  results  then  lead  to  the  following 
proposition. 

Proposition  3.  A  representation  of  the  least  sensitive  incentive  strategy  within  the  class  of  policies  which 
are  twice  continuously  differentiable  at  the  nominal  solution  point  is  given  by: 


yUv)=w 


W^I(i'~v')+hsn(S)k(v-v')2- 


dJ2\ 


where  all  the  partials  are  evaluated  at  the  nominal  solution  point. 


(24) 


6.  An  example  from  economics 

In  this  section  we  discuss  an  example  from  microeconomics,  which  illustrates  some  of  the  results 
obtained  in  this  paper,  especially  those  on  singular  incentive  problems.  Let  us  consider  a  duopolislic 
market  where  the  leader,  PI,  produces  two  goods,  X  and  T.  and  the  follower  produces  a  single  good.  Z. 
All  three  goods  are  substitutable  and  they  are  sold  at  the  same  price  p  which  is  assumed  to  satisfv  the 
linear  demand  relation: 


I 
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v  -  y  -  r  =  tiu  -  JlP  . 

where  a.  y  and  c  represent  the  quantities  to  he  produced  from  each  good  A’.  V  and  Z.  respectively:  d, 
and  d ,  are  positive  constants.  It  is  also  assumed  that  each  firm  has  a  general  positive,  increasing  and 
twice  continuously  differentiable  cost  function  /,(•  ). 

Then,  the  profit  functions  of  the  firms,  which  are  to  be  maximized,  become: 


{  J  *)(-'  ~  v)  /i(v.  y). 

(20a) 

A.  V.  C  5*0  . 

(26b) 

where  c  is  a  positive  parameter,  whose  role  will  be  clarified  in  the  sequel.  Furthermore,  we  let  >,(:) 
denote  a  two-dimensional  policy  vector  of  firm  1.  i.e  (a.  y)'  =  >i(c).  whereas  nrm  2's  policy  is  the  static 
vector  c. 

For  this  problem  it  can  be  shown  that  there  exists  a  sequence  of  closed-loop  policies  for  PI  which 
force  the  follower  to  r  =  0  [  1  ]  —  a  result  which  is  valid  for  any  demand  relation  in  which  the  price  is  a 
strictly  decreasing  function  of  c.  Ffowever.  for  the  leader,  the  limiting  strategy  is  not  well  defined  since  it 
requires  the  gain  vector  Q(a')  to  have  unbounded  elements  (i.e.  infinite  threat).  In  general,  such  a 
strategv  is  neither  credible  for  the  follower,  nor  affordable  to  the  leader.  In  its  stead,  an  alternative  is  to 
assume  that  PI  imposes  a  suitable  point  of  equilibrium,  compatible  with  the  duopolistic  nature  of  the 
problem. 

Towards  this  end.  let  the  objective  functional  considered  by  the  leader  be 

J i  =  K7T|  +  (1  -  I')TT2.  0  <  V  <  1  .  (27) 

where  /■  is  large  enough  to  provide  (he  leader  better  profit,  but  not  so  large  to  lead  to  a  noncredible 
incentive  scheme.  A  possible  upper  bound  for  r  may  be  the  one  which  provides  the  follower  a  profit 
comparable  with  what  he  would  make  in  a  Nash  equilibrium  case  The  objective  functional  of  the 
follower  is  still  v2.  Let  us  assume,  for  simplicity  in  the  analysis  no  follow  ,  that  v  =  s.  in  which  case  the 
leader  can  roughly  guarantee  two-thirds  of  the  market  2  To  complete  the  formulation  of  the  problem  lei 
us  assume  that  the  parameter  in  the  cost  of  production  of  Z.  namely  c.  is  uncertain  to  the  leader  Ffow¬ 
ever.  he  knows  a  nominal  value  of  c.  say  c*.  around  which  c  may  vary.'’  The  goal  of  the  leader  is  to 
design  a  strategy,  using  his  strategic  variables  x  and  y,  which  will  enforce  the  follower  to  the  maximizing 
arguments  of  J,  when  c  =  c*;  and  in  addition  he  seeks  a  strategy  under  which  his  profit  function  is  least 
sensitive  to  variations  in  c  Let  us  also  assume  that  the  leader  confines  himself  to  affine  strategies  This 
problem  is  within  the  scope  of  singular  incentive  problems  discussed  in  Section  4.  with,  however,  two 
exceptions.  One  of  them  is  that  the  desired  solution  point  is  not  the  team  solution  of  the  leader's  profit 
function  tti,  for  which  the  least-sensitivity  property  is  sought.  In  this  case,  the  first-order  sensitivity 
function  /i(a*)  does  not  vanish.  Nevertheless,  it  is  a  straightforward  task  to  show  that  7i(a  *)  is  invariant 
under  the  choice  of  Q(a’)E  J.  Ffence.  the  quantity  we  seek  to  minimize  is  still  7;(o*). 

The  other  distinct  feature  of  this  example  is  that  the  decisions  of  the  agents  are  restricted  to  be 
non-negative,  in  contrast  with  the  previous  theory  which  requires  the  decision  spaces  to  be  appropriate 
dimensional  Euclidian  spaces.  Ffowever.  it  will  be  clear  from  the  analysis  in  the  sequel  that  for  the 
problem  treated  in  this  section,  the  above  restriction  does  not  invalidate  the  use  of  the  previous 
theory 

'  This  ratio  of  course  depends  on  the  relative  structures  and  magnitudes  of  the  cost  fund  ions  /i  and  /;  When  both  firms  have  Ihe  same 
kind  of  cosi  function  this  ratio  will  he  exact!)  two-thirds 

*  Note  that  <  plays  the  role  of  a  introduced  in  the  previous  sections 


r 
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Let  us  assume  that  l\(x.  y )  and  l:(z,  c*)  in  (26)  are  such  that  J,  is  strictly  concave  on  X  *  V'  '  2  and 
let  (x*.  >•’.  ;*)  denote  the  triplet  which  globally  maximizes  J, 

Although  we  assumed  that  the  leader  should  be  satisfied  with  the  profit  corresponding  to  the  product 
level  (v*.  v*.  r“).  the  optimal  reaction  of  the  follower  to  (**.  v*)  is  of  course  different  from  since  his 
ultimate  goal  is  to  maximize  jt>,  not  it,  +  rr:.  wherein  c '  maximizes  the  latter  given  that  x  =  x  ' .  v  =  v '  and 
i  =  i*In  order  to  enforce  the  desired  triplet  (.v  *.  v  *.  c  *).  it  is  assumed  that  PI  announces  an  affine  policy  of 
the  form: 


y,(c)  =  (x*  +  qt(z  -  ,v*  +  H:  -  2’))'  ■  (28) 

where  q i  represents  the  coefficient  which  indicates  how  PI  would  modify  x  if  z*  and  similarly,  qz 
relates  the  change  in  c  to  y.  The  affine  strategy  (36)  will  induce  the  follower  to  produce  the  amount  c* 
when  q,  and  qz  satisfy  the  constraint  (2).  and  in  this  duopoly  problem  it  takes  the  form: 

qi*q:=  (-t*  +  ym)lz*  (29) 

When  the  uncertain  parameter  c  takes  its  nominal  value  c*.  any  pair  (qx.  qz)  satisfying  (29)  will  induce 
the  desired  outcome  We  will  explore  this  nonuniqueness  by  choosing  the  pair  (q‘.q*)  which  renders 
the  profit  7T j  of  Pi  least  sensitive  to  variations  in  the  uncertain  parameter  c  about  its  nominal  value  c*. 
To  gain  more  insight  into  the  problem,  we  will  now  assume  a  specific  form  for  the  cost  functions  /,(x.  v) 
and  /;(c.  c).  Towards  this  goal,  we  let 

/t  =  jCix2  +  k:y:;  /;  =  §cz’ .  (30a.  30b) 

Here,  c>0  is  the  uncertain  parameter,  with  a  nominal  value  c*.  which  determines  the  cost  of 
producing  z  units  of  good  Z.  Likewise,  c,  and  c2  determine  the  cost  the  leader  acquires  when  he 
produces  x  and  y  units  of  X  and  Y.  respectively.  When  the  uncertain  parameter  c  takes  its  nominal 
value  c*.  the  affine  strategy  (28),  with  (t/i,  q2)  satisfying 


<h  =  <72  = 


C|C*  +  C;C* 
C,C2 


(31) 


induces  the  follower  to  choose  z  =  z*.  Among  this  class,  the  pair  (t/'.</*)  which  renders  the  leader's 
profit  least  sensitive  to  variations  in  c  around  c*  is  computed  from  (19)  and  (31)  to  be: 

q*  =  c*/Ci\  q*  =  c*lcz.  (32a.  32b) 

Here,  it  is  seen  that  in  order  to  reach  a  robust  incentive  scheme,  the  leader  should  allocate  the  incentive 
among  the  goods  he  produces  as  inversely  proportional  to  their  respective  costs  of  production. 

In  the  preceding  example,  the  decisions  of  the  agents  were  restricted  to  belong  to  R*.  Owing  to  the 
smoothness  properties  of  the  profit  functions  and  incentive  strategies,  incremental  variations  in  the 
value  of  c  around  c*  (which  was  the  framework  for  the  preceding  analysis)  cannot  drive  the  equilibrium 
triplet  (jc*,y*,z*)  too  far  to  violate  this  restriction.  In  fact,  the  extreme  case  occurs  when  c  becomes 
too  big  to  discourage  P2  from  producing  anything.  When  his  competitor  leaves  the  market,  the  leader 
has,  of  course,  no  motivation  to  implement  his  cooperative  strategy  (28).  For  the  sake  of  completeness, 
let  us  assume  that  the  leader  sticks  to  (28),  with  (<ji,  q2)  satisfying  (29).  When  c  =  0.  we  have 


x  +  y  =  0 


for  all  such  pairs.  When  they  satisfy  the  equation  of  the  least  sensitive  pair  (32).  x  and  v  individually 
vanish  for  z  =  0;  hence,  the  possibility  for  a  negative  x  or  y  is  avoided,  even  for  this  extreme  ease. 
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Fig.  1  A  comparison  of  optimal  profit  functions  of  PI  under  nonmcremema)  variations  in  the  value  of  the  parameter  c  around  its 
nominal  value  c*. 


As  an  illustration  of  the  robustness  properties  of  (32)  the  profit  incurred  by  PI  is  plotted  in  Fig  1 
against  the  possible  values  of  the  uncertain  parameter  c,  for  a  given  set  of  values  of  the  parameters  of 
the  game.  The  solid  line  represents  the  profit  made  by  PI  when  the  robust  strategy  (28).  (32)  is  used. 
The  dashed  line  stands  for  Pi's  profit  when  he  allocates  all  the  incentive  to  a  single  good  When  the 
uncertain  parameter  assumes  its  nominal  value  c*  =  5.0.  both  incentive  schemes  yield  the  desired  level 
of  profit  for  PI  When  c  deviates  from  its  nominal  value,  however,  the  profit  value  decreases,  as  is  to  be 
expected,  but  when  the  pair  (q ,,  q2)  is  chosen  according  to  (32).  the  incurred  profit  is  less  affected  b\ 
such  variations  in  c  around  c*.  Hence,  the  robustness  property  prevails  not  only  for  incremental 
variations  around  the  nominal  value,  but  also  in  a  reasonably  large  neighborhood  of  the  nominal 


7.  Conclusion 

In  this  paper  we  have  introduced  a  minimum  sensitivity  approach  towards  the  solution  of  deter¬ 
ministic  incentive  design  problems,  which  leads  to  optimum  robust  incentive  policies  that  are  least 


J 


:44 


O  H  Cansever  T  Ba$ar  !  Incentive  design  problems 


sensitive  to  the  variations  in  the  value  of  a  parameter  vector  (from  a  nominal)  characterizing  the 
follower's  cost  function.  Possible  extensions  of  this  minimum  sensitivity  approach  are  to  the  class  ot 
problems  in  which  the  leader  has  partial  dynamic  information  (as  in  [5]  and  [9])  and  to  the  class  ot 
stochastic  incentive  problems  discussed  in  [II],  For  an  extension  to  the  latter  class  of  problems  see 
[U]. 
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Abstract 

In  this  paper  we  introduce  the  nocion  of  "robust" 
incentive  schemes  in  multi-agent  decision  problems  with 
a  hierarchical  decision  structure,  and  discuss  deriva¬ 
tion  of  such  policies  by  minimizing,  in  addicioa  to  the 
usual  (standard)  Scackelberg  performance  Indices,  an 
appropriate  sensitivity  function.  Such  an  approach  has 
applications  in  decision  problems  wherein  the  leader 
does  noc  know  the  exact  “alues  of  some  parameters  char¬ 
acterizing  the  follower's  cost  functional,  and  seeks  to 
robustiiy-  his  optimum  policy  in  the  presence  of  deria- 
tions  from  the  nominal  values.  An  in-depch  analysis  of 
such  incentive  design  problems  is  provided,  and  some 
concrete  analytical  results  are  obcained  for  general 
cost  functionals  with  a  convex  structure.  The  results 
are  then  applied  to  an  incentive  design  problem  arising 
in  economics,  leading  to  some  meaningful  robust  incen¬ 
tive  policies. 

I.  Introduction 

Optimum  incentive  design  problems  constitute  a 
promising  md  mathematically  challenging  class  of 
decision  problems  in  economics  and  operations  research 
([S]-(8I).  and  have  also  recencly  accracted  the  atten¬ 
tion  of  control  chaoriscs  ( [2 ]-[ 5) , [9 J-[11J)  because  of 
t.ne  ciose  relationship  with  Scackelbarg  games  <[  1  ],[■*] , 

[  1 J  ;  j .  View.ed  as  a  dynamic  Stackelberg  game,  an 
optimum  design  problem  involves  a  hierarchy  in  decision 
making  and  an  information  scruccure  that  allows  che 
uecision  maker  at  che  top  of  the  hierarchy  (to  be 
called  che  leader)  to  acquire  (perfect  or  partial) 
information  on  the  actionsof  che  ocher  decision 
maker  is)  (the  so-called  foilower(s) )  .  This  available 
information  enables  the  leader  to  design  a  policy 
■  called  an  incentive  scheme)  thac  forces  the  toliower(s) 
to  a  desired  (from  the  leader's  point  of  view) 
behavior.  The  utmost  goal  sought  by  the  leader  may  be 
rescribed  in  the  form  of  maximization  of  a  utility 
function  (or  minimization  of  a  cost  function),  or  it 
may  be  3ome  set  of  utopic  points  in  an  appropriate 
space,  determined  according  to  some  criterion.  The 
incentive  scheme  may  be  in  the  form  of  a  threat  stra¬ 
tegy  [9]  or  some  smooth  policy  with  appealing  regu¬ 
larity  conditions,  as  discussed  in  [ 5 J .  This  latter 
reference  [51,  in  particular,  shows  that  in  the  case  of 
two  decision-maker  problems,  and  when  che  follower's 
tost  function  is  strictly  convex,  there  exists  an 
cocimai  affine  scracegy  for  che  leader,  which  is  affine 
Lr.  the  iynamic  information.  This,  however,  does  noc 
imply  thac  the  solution  will  be  unique;  in  fact,  there 
■••ill  exist,  in  general,  a  multitude  of  optimal  incen¬ 
tive  schemes. 

Existence  of  multiple  solutions  to  incentive 
easier,  problems  pror.pcs  a  further  iesigr.  criterion 
sc.-crcing  to  •uric-  s  further  selection  can  be  made.  In 
t.-.ii  paoer.  ve  introduce  suer  a  selection  criterion 

‘This  work  •••S3  supported  in  pact  by  the  Joint  3er- 
•i.es  Electronics  Program,  under  contract  L'0GCl--7?-C- 
1-I-;  unc  or.  oart  bv  tne  Dear,  of  Energy.  Electric 
Energy  5  stems  Division.  uncer  Contract  DE-.,CD  1- f ISA- 
506 o*  vim  Dynamic  i-.  stems,  PJ3  -Jo.  'Jrnunu.  IL  oifOi. 


under  which  robust  incentive  policies  can  be  obtained 
for  cwo-agenc  deterministic  problems  with  strictly 
convex  cost  functionels.  More  specifically,  we  assume 
that  the  leader  does  not  know  the  exact  value  of  a 
parameter  thac  cheraccerlzee  the  follower's  cost 
function,  and  thereby  his  optimal  response  function. 

He  may  take  a  nominal  value  for  this  unknown  parameter 
(assuming  chat  such  a  candidate  exlscs)  and  seek  an 
optimal  lncenciva  schema  chat  is  not  only  optimal  for 
thac  chasan  valua,  but  Is  also  laast  sensitive  to 
deviations  from  che  nominal.  Wt  call  such  Incentive 
policies  robust,  and  develop  In  this  paper  a  new 
approach  for  the  derivation  of  such  policies  by  intro¬ 
ducing  an  appropriate  sensitivity  function. 

The  problem  is  formulated  in  precise  mathematical 
terms  in  Section  II.  Section  III  Introduces  cha  firs: 
and  second  order  sensitivity  functions  and  discusses 
derivation  of  robust  affine  policies  for  a  general 
class  of  problems  with  convex  cost  functionals.  Sone 
special  classes  of  problems  with  separable  and  singular 
cost  functions  ara  treated  in  Section  III  leading  to 
soma  expllcic,  analytic  solutions.  Section  V  deals 
with  an  extension  of  these  results  to  a  larger  class 
of  incentive  schemes  which  also  include  nonlinear 
policies.  In  Section  VI,  the  results  of  Sections  III 
and  IV  are  applied  to  a  problem  arising  in  micro¬ 
economics,  and  it  is  shown  that  the  unique  robust 
affine  incentive  policy  tor  the  Izader  bears  a  very 
meaningful  economic  interpretation.  The  concluding 
remarks  of  Section  VII  end  the  paper. 


II.  Problem  Formulation 

Consider  a  cwo-person  deterministic  dynamic  game 
in  normal  form,  described  by  the  cost  functionals 
d i  ('ft  tV**)  and  Je('!  for  player  1  (the  leader'  end 

? layer  I  (the  follower),  respectively.  Here,  the  stra¬ 
tegies  and  y?  belong  to  o  rriun  specified  strategy 
spaces  T,  and  respectively,  and  neACR  denotes  a 
parameter  on  which  the  follower's  cost  functional 
depends.  Let  us  denote  the  decision  variables  of  che 
leader  and  of  the  follower  by  ue  U  and  v6V,  respec¬ 
tively,  where  U  and  V  are  appropriate  subsets  of  Rn 
and  R®,  and  they  represent  the  decision  spaces  of  the 
leader  and  the  follower,  respectively.  In  this  paper, 
we  will  assume  Chat  the  follouer  has"  open-loop  informa¬ 
tion,  and  hence  take  V  5  7v  By  an  abuse  of  notation, 
we  also  let  J,(u,v)  and  J?(u,v.u)  denote  the  cost 
functionals  over  the  product  -•pace  U-V,  for  eacn  uSA. 
Let  us  further  assume  thac 

( 1)  u,v,u)  Is  stricclv  convex1^  and  twice  con¬ 
tinuously  differentiable  on  ”<V,  with  c  =  A. 

(II)  The  leader  has  oerfect  access  to  the  follower's 
action  v. 

li.iL)  The  leader  is  uncertain  about  t.ne  actual  value  c: 
.  -E  A .  However,  he  designs  nis  strategy  accenting 
to  a  nominal  value  of  ..  sav  v'  =  A;  <e=oi.ng  in 
mir.d  tnat  v*  may  .noc  he  tne  actual  va.ue. 


Thi3  restriction  will  be  reiaxec  later  in  ;-.I. 


1 


Uctdar  chls  sacup,  tii*  problam  faced  by  eh*  leader 
is  twofold: 

a)  To  design  a  Staclteiberg  strategy  v*S  Tj_  which, 
by  also  taking  into  account  rational  reactions  of  the 
follower,  leads  to  a  desired  value  of  which  say  be 
its  global  minimum  over  U«V;  let  us  denote  such  a  value 
by  J,.  More  preciselv,  one  of  the  objectives  of  che 
leader  is  to  find  a  v*s  such  that 

J,(v*,v)-J*  for  all  v€R  I-,*),  (la) 

11  1  a*  l 

where  R^fy.)  denotes  the  optimal  reaction  sec  of  che 
follower  and  it  is  defined  for  each  fixed  a€A  by 

ai(Y1)-{v<'EV:.l2(-f,.v*.a)<J2(Y1,v,a),  VvSV).  (lb) 

Under  an  additional  technical  restriction  which 
will  be  delineated  in  che  sequel,  it  has  been  shown  in 
[3]  chat,  for  each  fixed  <xgA,  there  exijes  an  affine 
scracegy  for  the  leader  which  leads  to  J2.  More  pre¬ 
cisely,  wich  a  fixed  at  a  nominal  value  n*.  there  exists 
an  n-m  matrix  Q(a*),  satisfying  the  relation 

)J2  W, 

IT  Q<**>  +  -t  •  0  (2) 

so  that  che  incentive  strategy 

u  -  v^v)  .  uC  +  q(a*)(v-vc)  (3) 

3J,(u,v,a) 

forces  the  follower  to  v*  v  ,  provided  that  — -  vu - 

evaluated  ac  (u”uE.  v*vc)  does  not  vanish  within  an 
•c -neighborhood  of  a*,  where  (uc,vt)  ainioizes  J^fu.v) 
over  L’«V. 

In  general,  equation  (2)  defines  a  class  <2  of  (n*m) 
aacrices  which  force  the  follower  to  v»vc.  We  also 
note  chat  there  exist  ocher  strategies  for  the  leader, 
uhich  attain  and  they  need  not  be  of  che  affine 
fora  as  in  (3). 


b)  Since  che  leader  does  not  have  perfect  knowledge 
of  u,  the  actual  value  may  be  different  from  a*,  in 
which  case  the  leader  will  end  up  with  an  inferior 
performance,  because  y*  so  defined  is  optimal  only  at 
a «  a*.  Therefore,  it  is  highly  desirable  for  the 
leader  to  have  a  robustness  property  associated  wich 
his  optimal  strategy.  More  precisely,  che  leader 
would  like  to  have  the  sensicivicy  of  the  realized 
value  of  his  cost  functional,  against  che  variations  in 
a  about  its  nominal  value  a*,  be  as  small  as  possible. 
This  property  may  be  induced  by  making  use  of  the 
intrinsic  nonuniqueness  of  che  solutions  of  (2) ,  and 
also  by  introducing  nonlinear  strategies  which  satisfy 
(la),  as  it  will  be  shewn  In  che  seccions  to  follow. 


III.  Introduction  of  a  Sensitivity  Function 
and  Derivation  of  Robust  Solutions 


Let  7,e  rL  be  an  incentive  strategy  for  che  leader, 
and  v  syvj).  Towards  che  goal  set  in  Seccion  II,  and 
as  a  measure's!  the  sensitivity  of  Ji^y^.v^)  with 
re3?ecc  to  deviacios  of  a  from  its  nominal  value 
let  us  introduce  che  total  derivative  of  J^(y  j^.v  ), 
with  respect  to  a,  evaluated  ac  a  »  a*,  u*uc,  and  v «  vt; 
more  precisely,  lac  us  confine  ourselves  to  dtflne 
strategies  of  the  fora  (3)  and,  by  abuse  of  termino¬ 
logy,  let  us  define  the  first  order  sensitivity  function 
of  S,  <  .£,•/,)  with  respect  to  a.  and  ac  a*  a”,  as 


dJ  .< 


•J. ( u , v ) 


Of  a>  *- 


(l> 


however,  it  the  leader  enforces  the  pair  ia’.vc>  whicn 


globally  minimizes  on  che  produce  space  U*V,  I, (a) 
identically  vanishes  for  all  a€  A.  In  this  case, 'the 
next  term  In  che  Taylor  expansion  of  Jit'n.v  )  around 
l*,  which  we  call  che  second  order  sensitivity  function 
of  J^Cyi-v,)  wich  respect  to  a,  will  have  to  be 
considered.  Denoting  lc  by  I2(j),  and  suppressing  che 
arguments,  we  have 


I,(a*) 


Ad  Jl<W*Va>! 

- 7T 


( a-a 

1*UC 


|  u*u 


2  2 
,  a J.  *  J, 

-  ~(Q’  - T-  Q+Q'  — Q- 

da  ,^2  5uov 


32J 

— $Mv;, 

3v2  d“  ! 


(5) 


dv 

la  order  to  find  an  expression  for  ve  noce  chat  che 
.  aa 

equation 

3  J 


17  Q(“*)  +  17L-y.(v  )  ■  0 
l  a 


(6) 


1s  an  identity  for  all  i€A,  and  lc  completely  speci¬ 
fies  che  opcimal  response  vQ  of  che  follower  when 
affine  strategies  y,  of  the  form  (3)  are  used.  The 
derivative  of  che  above  expression  wich  respect  to  a 
would  still  vanish  for  all  a€A.  -e  then  have,  with 
Q  »  Q(a*) , 

?  2  2  2  2 
3*3,  rj,  3U-  dv  ,  ,3  J, 

IQ’  - r  Q-H?  -r~+r-r-=-  Qf - tr  !(•?")  *0  T-r1- 

„  2  }vju  juijv  .2  da  saau 

3  U  3V 


5‘J, 


Va€  A. 


(7) 


Since  J2(u,v,a)  is  strictly  convex  In  u  and  v  for  ail 

3  v  i 

a€  A,  the  coefficient  matrix  of  Or— j  is  posicive 

definite,  and  thereby  Invertible.*'  Combining  (5)  and 
(7)  we  obcain  che  second-order  sensitivity  function  of 
Jl(Yl,v  )  wich  respect  to  a,  at  a-  a*,  to  be 


.2, 


I, (a*)  -  £ (Q '- 


3u* 


,3’J,  3  J, 

Q+Q  t  1  ~  +  .  t*  Q 
jv-3u  ?u<?v 


3-j,  -!  1 

— r>  (Q  * 


:V 


;a?u^;  ,  2  »v3u  iuiv  ^ +  .  2 

3U  JV 


[<Q  - t  Q^Q ~ 

,2  jVjU  jlijV 


)a3v 


))• 


(9) 


Remark  I.  When  che  leader  enforces  his  team  solution, 
his  objective  will  be  co  minimize  ^ (a*)  over  Q  subject 
to  (2),  since  I^(a*)  vanishes.  On  che  other  hand, 
there  may  be  cases  when  he  would  prefer  co  enforce  a 
point  other  than  (uc,vc),  in  which  case  I* (a*)  will  net 
necessarily  be  zero.  However,  when  the  nonvanishing 

does  not  depend  on  the  choice  of  Q  from  the  clas 
class  0  of  n*m  matrices  which,  together  wich  in  affine 
strategy  of  the  form  (2),  enforce  the  follower  to  the 
desired  point,  one  still  has  co  consider  !■»■**)  is  a 
measure  of  obtaining  minimum  sensitive  solutions.  This 
point  will  be  further  elucidated  in  Section  VI.  ■ 

The  problem  now  is  to  minimize  Imi*)  over  all 
(a.z)  real  matrices  0(  ;*)  subject  to  cne  constrain: 


“or  invert  ib  ill  tv  i 
tnac  the  Hessian  matrix  of 


is,  of  :ourse.  sufficient 
■>  be  full  rank. 


2 


.Aw.  1 


jj,  .-j.,. 

-rf  QC*  >  ♦  ~\ 


has  Co  sack  solutions  which  ars  on  chs  boundary.  In 
sooa  special,  but  sufficiently  general  cases,  such  solu¬ 
tions  can  be  obtained  analytically,  as  we  will  eluci¬ 
date  in  the  next  section. 


The  solution  to  this  optimization  problem  may  dictate 
some  of  the  entries  of  Q(n*>  to  cake  arbitrarily  large 
values  which  corresponds  to  high  gain  feedback.  How¬ 
ever,  if  U  is  a  bounded  set,  Che  value  of  such  an 
optimizing  strategy  may  not  belong  to  Che  sec  of  per¬ 
missible  controls.  It  is  therefore  necessary,  also  in 
j  view  of  the  fact  chat  high  gain  may  not  be  desirable, 

*  to  Impose  bounds  on  the  entries  of  Q(a*) ,  of  che  form 

i<!ijlskir  1-1 . .  J"1-—"  <10a) 


qll  q12 


Mow,  let  us  first  assume  thac  there  exists  an  inner 
point  of  the  set  defined  by  (10a),  which  minimizes 
lota*)  subject  to  (9).  Then,  che  sec  of  first  order 
necessary  conditions  for  opcimalicy  are  given  in  che 
proposition  co  follow. 

Prooosiclon  L:  A  sec  of  flrsc  order  necessary  condi¬ 
tions  for  an  inner-point  solution  Q*  (satisfying  (10a) 
with  strict  inequality)  of  the  optimization  problem 
formulated  in  this  aecclon  is  che  existence  of  a  vector 
oc  Lagrange  multipliers  R*  (\,  ,X  )',  L6R, 

.  «  ,  a  4  in  • 

1*1,2,..., a,  suen  chat 

Vz01*5 "  (na) 


iv  a«o 
]  u*ul 
v*vc 


IV.  Solutions  co  Some  Special  Cases 

In  chis  section  we  relax  the  hypotheses  of  Propo¬ 
sition  1  in  two  different  directions  for  some  special 
cases,  and  obtain  some  explicit  solution*.  Ue  restrict 
attention  primarily  to  two  classes,  viz  problems  with 
separable  cost  functionals  and  the  so-called  slnguler 
incentive  problem*. 

1.  Separable  cost  functional* 

Consider  the  class  of  problems  In  which  che  uncer¬ 
tain  parameter  a  effeccs  the  cost  functional  of  che 
follower  only  through  v,  l.e.  J^Cu.v.a)  la  separable  as 

J2(u,v,a)  -  g1(u,v) +  g2(v,a)  (12) 


In  which  case  the  term  - — —  in  (8)  vanishes, 
sasu 

Let  us  assume,  for  the  sake  of  simplicity  in 
analysis,  chat  n»2 ,  m*l,  Then,  under  che  equality 
constraint  (9) ,  it  is  possible  co  write  one  component 
of  Q(a*)  in  terms  of  the  other  component,  and  hence 
express  12(0*)  in  terms  of  either  q^  or  q7.  We  observe 
that  in  the  limit  as  q^  (or  qo)-*,  tends  co  its 

absolute  minimum  value  zero.  This,  however,  violates 
the  constraint  (10a),  and  therefore  we  have  co  look  for 
boundary  solutions. 


Solving  for  in  terms  of  q„  from  (9)  and  substi¬ 
tuting  into  12(0*),  we  arrive  at  the  optimization 
problem. 

1  ?  2 

rj  ,  3ij  >  j . 

min  I2(u*)  -  (_A(fq2+0-.2  —^(^1+2  — ^  q. 


1  2  JU. 

2  *>  , 
n,  /  s-J,  , 


2  ?2ji. 

7  la  +  1  • 


J  J  /  3  J  2  3‘J, 

[3^/(~f({q2*i)2-2— ^92(SV:) 

3u,  1  2 


-(iq7+i)  +- 


iu,iv  ’2 


3-J,  2, 

— f)  ! 


provided  chat  -7 —  evaluated  at  (u»ut,  v*vc)  does  noc 
vanish  within  an  -:-netghborhood  of  a*. 

Proof:  Let  us  assume  that  there  exists  an  inner-point 
minimizing  solution  ()*€(}•  Then,  any  matrix  in  che 
constraint  set  can  be  written  as 


where  R  and  iSDCS,a,  where  R  denotes  che  set  of 

all  p -.m  matrices  with  real  entries  and  D  is  chacac- 

ij  7 

cerized  by  D*  i.€R  :  -r-®-  1*0;.  Since  0  is  a  vector 


,  cerized  by  0  *  ■  i. €  R_^  :  .  ~  1*0.. 
soace,  we  snould  have 


Her.ze  the  columns  of  T^Iifa*)  must  be  Linearly  in.de- 

’■  J  **  ' 

oer.ee nee  on  — vn i  an  irjpiics  rhe  <2 y.i stance  of  a  sec 
of  scalars  ....  such  cnac  Ilia;  is  satisfied  ac 
an  :pcinizinij;~?oinc .  ■ 

•hen  a  solution  saciscy-r.;  1  11)  cannjc  be  found,  jne 


3u-  /  eu,  *  iv  /  Ou 


This  is  equivalent  to  maximizing  the  denominator 
of  the  second  product  term  of  (13)  under  the  constraint 
I  q  2  j  '  .  Since  the  denominator  is  strictly  convex  in 

q7,  the  "’minimizing  solucion  is 

-  -sgn(q^)k;  (i%) 

vhare  q?  is  che  vaiue  of  q.,  ac  which  che  derivative  o: 
the  denominator  with  respect  to  q,  vanishes,  namely 


qf  *  (Z 


Here  we  assume  chat  k.  is  sue ficiencl”  Lurse  so 
that,  so  iong  as  the  solution  to  vi3)  is  finite,  q, 
solved  from  i.9i  satisfies  the  given  conscrainc. 


I 


3 


32J,  + 

2 

JU2 

’U13u2  + 

if 

0 

q2 

IV 

o 

l!  l-i 

if 

o 

«•> 

<  0 

Then,  q*  is  obtained  via  the  constraint  (9).  If  Che 
constraint  is  not  compatible,  one  should  reformulate 
It (a*)  in  terms  of  q^  and  repeat  che  same  procedure. 
Tfiis  result  can  be  excended  to  cases  when  n>  2.  The 
following  algorithm  can  be  used  co  obcain  Q*(o*)  which 
minimizes  ^(a*): 

An  algorithm  co  obtain  che  minimizing  argument  Q*(a*) 
of  It (a*) : 


Compuee 

„o  A  ,  o  o  o, 

Q  . qn) 


(ii>  Sec 


qj » -sgn(q°)k2,  i-1,2 . n 

check  for  satisfaction  of  che  constraint  (9). 
(lii)  If  (9)  is  noc  satisfied,  sec 

q*  ■ -sgn(q°)k‘ ,  for  all  i»l,...,n,  but 


' -sgn(q°)k‘,  for  all  i»l,...,n,  but 


l#j  for  some  j . 

Search  for  q*  <  k^  satisfying  (9);  j«l,...,n. 
If  (9)  is  noc  satisfied,  sec 

q*  »  -sgn(q°)k2  Vi*l,...,n;  but  ii*  j  ,  iiH, 
for  some  J ,2. 

Search  for  q*  <  k*  and  9*  i  ki  satisfying  (9); 

J. 1-1,2 . n. 

Scop  when  (9)  is  satisfied. 


reaction  of  che  follower  to  any  announced  strategy  of 
the  form  (3),  where  Q(a*)elJ>  1*  v»vc,  which  is  con¬ 
firmed  by  (2)  as  a  first  order  necessary  condition,  and 
(18)  provides  a  second  order  sufficient  condition  for 
che  otpimality  of  v»vc. 

For  this  class  of  incentive  design  problems,  it  is 
possible  to  obcain  an  inner-point  solution  analytically, 
because  of  the  particular  structure  of  che  sensitivity 
function.  Towards  this  end,  lec  us  assume  chat  che 
follower's  coat  function  is  linear  in  u  and  separable 
as  in  (12).  Lec  us  further  assume  chat  n-2,  m-1. 

In  this  case,  constructing  q,  in  terms  of  q.  as  in 

t _  I  1 


In  this  case,  constructing  q.  in  t 
§IV.l,  l2(a*)  becomes 

32J  ,  32J 

V8*)-  *TT*V«  -2  HTT7fJV;',+2 

du.  1 
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>J,  »  J,  *  J,  'J, 

+  ,v2  il3a3v  /  (3u,3v  q2  "  Ju^v 


Oq-j-K) 


where  i  and  i  are  defined  by  (11). 

The  derivative  of  I, (a*)  with  respect  to  q. 


vanishes  at 
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This  algorithm  can  be  excended  to  the  case  when  m»  1 
or  functionals  Jt(u,v,i)  which  are  separable  as  in  (12). 

2.  Singular  Incentive  problems 

.in  Interesting  case  occurs  when  che  leader's 
concrol  affects  che  follower's  cost  functional  linearly. 
In  this  case,  che  follower’s  cost  functional  is  not 
convex  on  the  product  3pace  U-*V,  and  therefore  the  pre¬ 
vious  theory  on  linear  incentive  problems  is  noc 
direccly  applicable;  however,  an  extension  is  possible 
as  we  eLucidate  in  che  sequel.  Towards  this  end,  first 
noce  that  when  che  leader  announces  an  affine  strategy 
of  che  form  (3),  the  follower's  cost  functional  Jn 
becomes  a  function  of  only  v,  for  a  given  a€A. 
Accordingly,  if  che  functional  d2(uC+Q(a) (v-ve) ,v,o)  is 
strictly  convex  on  V  for  each  iSA,  and  if  equation 
(2)  admits  a  nonempty  family  Q  of  solutions  at  the 
desired  point  (uc.vt),  then  the  least-sensitive  incen¬ 
tive  design  problem  becomes  meaningful  for  this  parti¬ 
cular  class  of  cosc  functionals.  More  precisely,  for 
a  given  uc'=L',  let 

d"J,(u,v.u)  •*J,lu,v,i)  :-'J7(u,v,a) 


u ,  v  ,•- ) 

- — - >0  (Id) 

2  v  '  U 


for  all  .•‘5  V, 
*o  3e  nonvc:-. 


A  .lr.d  ^a’')c  vr.ere  0  is  issu^e1! 

Vnoer  these  assur.oc  teas ,  the 


L  ^  „  2 
2  ^ 

3u, 3u,  5u-3v 

12  i 


3u^3v  3u,Jv 


3u^3v^ ' j . 


provided  that  the  denominator  of  (13)  does  not  vanish, 
and  k£,  i«l,2,  are  sufficiently  large.  Furthermore, 
this  is  the  unique  stationary  point  of  I-.(a*).  This 
result  leads  to  the  following  proposition. 

Proposition  2:  The  minimum  sensitive  (robusc)  linear 
incentive  design  problem  formulated  in  this  section 
admits  a  unique  solution  (q^^)  where  qt  is  given  by 
(19b),  and  q^  is  obcained  through  the  linear  con¬ 
straint  (9)  which  reiaces  q^  and  q7,  provided  chat  the 
denominator  of  (19a)  does  not  vanish,  and  (q?,q*)€Q. 

Proof :  This  result  following  from  the  following  four 
properties  of  the  function  F( q ,)  »  I, ( a*) : 

(i)  There  exists  a  finite  number  M  such  that 


Urn  F(qJ 


lim  ?{q,) 

— »  “ 


(li>  F(q,)--J 


4 


t 

Su^jv 


5)  *  q. 


(Hi)  F(q,)  is  continuous,  except  ac  q,  »q,. 

('•*)  F(q,)  has  a  single  stationary  point  q?. 

Ih.  a  rout  properties  readily  lead  to  the  conclusion 
eh...  q*  is  the  unique  minimizing  solution  for  F  e 


V.  Vse  of  Nonlinear  Strategies  in  Sensitivity 
Considerations 


The  analysis  of  previous  sections  was  confined  to 
tfc  class  of  linear  strategies.  Although  this  class  is 
hi  enough  to  provide  optimal  solutions  to  incentive 
pt-ulcms  [5],  the  use  of  nonlinear  strategies  may  pro¬ 
vide  additional  degrees  of  freedom  when  dealing  with 
se-eieivicy  problems.  For  example,  when  n*m«l,  the 
ci  craint  (9)  decemrines  Q(a*)  uniquely,  except  for 
gf  ric  cases  where  optimum  linear  scracegies  do  not 
exist.  Hence  in  this  case,  the  sec  is  *  singleton 
and  it  does  not  allow  sensitivity  considerations.  How- 
ei  •,  if  the  leader  is  permitted  to  enlarge  his  sna¬ 
re  space  by  including  a  suitable  nonlinear  term  in 
his  control,  he  may  have  extra  degrees  of  freedom  to 
reduce  che  sensitivity  of  his  performance  to  changes 
lr  the  uncertain  parameter  in  the  follower's  cost 
ft  itional.  Towards  this  end,  let  us  assume  that  the 
s.  itegy  space  of  che  leader  is  the  sec  of  all  mappings 
from  7  onto  U ,  twice  continuously  differentiable  with 
resoect  co  v,  and  with  bounded  first  and  second  deriva- 
e.  is.  For  che  case  n»m«l,  and  with  separable 
J  i.v.a),  che  second  order  sensicivlty  function  cakes 
the  fora 
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problem  Is  to  minimize  I_(a  )  over  *  and 
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i  i*  i*,  u  ■  uw,  ar.c  v  »  v  .  Hera, 
ae: grained  by  the  constraint  (9>- 
nininizacicn.  one  obtains 


(21) 

(22a) 

(22b) 


3  Y  , 

~ : — ”  is  completely 

-  v 

Carrying  ouc  this 


and 


(23) 


(2D 


where  the  partiaLs  are  evaluat’d  ac  a*  a’,  ar.c 

v  ■  v- .  It  is  jiao  assumed  here  that  kr  is  sufficient!;' 


large.  These  resulcs  lead  co  che  following  proposition. 


Proposition  3:  A  representation  of  the  lease  sensitive 
optimal  Incentive  strategy  within  che  class  of  twice 
continuously  dlf faranciable  policies  with  bounded  first 
and  sacond  derivatives  with  raspecc  co  v  is  given  by 


3J 


3J_  3J„ 


y*(v)-  [1-i  sgn(-^)k2)ut-;^/-r^](v-vt)  + 


3J, 


1  '•*■>■>  .  2  t 

+  4  sgnC-r-^Jk'tv- v  +1)  u  . 
2  3u  2 


(25) 


Proof-:  Follows  from  the  previous  discussion.  e 


The  case  when  n>  2  can  ba  created  in  a  similar 

5yl 

mannar.  Components  of  — —  are  obtained  via  the  algo- 


>2V, 


richm  presented  1q  Section  IV,  and  — -r*  la  given  by 

.2  i  ,  3V 
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— - *  i  •  1,2 ....  ,n.  (26) 
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Remark:  Since  t- - —  >0  1*1, 2,.... a  at  the  nominal 

-  3ul  3v2 


point,  che  enlarging  of  the  strategy  space  of  che 
leader  definitely  improves  his  sensitivity  function. 
One  could  introduce  further  improvements  by  allowing 
for  higher  order  continuously  differentiable  strate¬ 
gies.  ■ 


VI.  An  Example  from  Economics 

In  this  section  we  discuss  an  example  from  micro¬ 
economics,  which  illustrates  some  of  the  resulcs 
obtained  in  this  paper,  especially  che  ones  on  singular 
incentive  problems.  Let  us  consider  "a  duopollscic 
market  model  in  which  two  firms  compete;  che  leader,?!, 
produces  the  goods  X  and  Y,  and  che  follower  produces 
Che  good  Z.  All  three  goods  are  substitutable  within 
an  e-neighborhood  3t  the  equilibrium  point  of  the 
market  and  they  are  sold  at  che  same  price  p  which  is 
assumed  co  satisfy  the  linear  demand  relation 

x  +•  v+  -  »  d  -d,  ?  f  27) 

o  1 

where  x,  y,  and  z  represent  the  quantities  co  be  pro¬ 
duced  from  each  good  X,  Y,  and  Z,  respectively  d0  and 
d,  are  positive  constants.  It  is  also  assumed  chac 
firms  have  a  logarichmlc  cost  function,  compatible  with 
the  appealing  hypothesis  chat  cost  per  unit  of  produc¬ 
tion  decreases  as  the  level  of  production  increases. 

Then,  the  profit  functions  of  che  firms,  which 
are  to  be  maximized,  become 

d  -(x+y+z) 

wi  ”  (~S~ : - )  (x+y)-c  in(x+l)-c,ln(yJ-l)-l., 

id,  lex 

A 

d  -(xi-y^z) 

V,  *  (-2— - )z-c,in(z-i)-L. 

dl  3 

x,y,z  > 0. 

Here  c.>0,  i«l,2,3  are  parameters  reflecting  the  dif¬ 
ferences  in  che  cost  of  productions  of  X,  Y,  and  2, 
respectively.  It  is  assumed  that  y  ,  i«l,2,3  are  con¬ 
stant  within  an  ^-neighborhood  of  tne  equilibrium  -omc 
of  the  market.  L<>0.  i*i,2  are  fixed  costs  of  che 
firms . 

for  this  problem  there  exists  a  sequence  or 
closed-loop  policies  for  ?!  whichvforce  che  follower  to 
z*9  — a  result  which  is  valid  for  anv  demand  rela¬ 

tion  in  which  the  price  is  a  atrictlv  decreasing  func¬ 
tion  of  o.  However,  tor  the  leaner  the  limiting 

‘y-. 
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strategy  ia  not  well-defined  ainca  it  requires  cha 
gain  vactor  Q(  *)  eo  hava  unboundad  elements  (l.a., 
inf Ini ta  chraac).  In  gaaaral,  such  a  strategy  la 
aaichar  cradlbla  for  cha  followar,  nor  affordable  for 
cha  laadar.  In  its  acaad,  an  alternative  la  co  aaauaa 
chac  PI  lmpoaaa  a  sulcabla  point  of  equilibrium,  com- 
pacibla  with  cha  duopollseic  nacura  of  cha  problem. 

Towards  this  and,  lee  cha  objective  functional 
considered  bv  cha  laadar  ba 


Jl"  v1Ti+ 


0  <  v  <  1 


where  v  is  larga  enough  Co  provide  eha  laadar  baccar 
profit  and  noc  so  larga  co  lead  co  a  noncredibla  incen¬ 
tive  schema.  A  possible  upper  bound  for  v  may  ba  cha 
one  which  provides  cha  followar  a  profit  comparable 
with  whac  ha  or  she  would  make  in  a  Nash  equilibrium 
case.  The  obj active  functional  of  cha  followar  is 
scill  Tj-  Lac  us  assume,  for  simplicity  in  cha  anal¬ 
ysis  co  follow,  chac  v*l/2,  in  which  case  eha  laadar 
can  roughly  guarantee  cwo-chlrds  of  cha  market.  To 
complete  cha  formulation  of  cha 'problem,  lac  us  assume 
chac  cha  parameter  in  cha  cose  of  production  of  Z, 
namely  C3,  is  uncertain  for  cha  leader.  However,  ha 
knows  a  nominal  value  of  c^,  say  cJ,  around  which  Cj 
can  vary.  The  goal  of  cha  laadar  is  co  design  a  stra¬ 
tegy,  using  his  strategic  variables  x  and  y,  which  will 
enforce  Che  followar  co  cha  maximizing  arguments  of  J^ 
whan  £3*03;  and  in  addition  ha  seeks  a  scracagy  under 
wnich  his  profic  function  is  least  sensitive  co  varia¬ 


tions  in  Ci 


Lac  us  also  assume  chac  cha  laadar  con¬ 


fines  himself  co  afflna  scracegias.  This  problam  is 
within  eha  scope  of  singular  lncanciva  problems  discus¬ 
sed  in  Section  IV. 2  wlch  cha  sole  excepcion  chac  cha 
enforced  poinc  is  noc  cha  team  solution  of  his  profic 
function  for  which  ha  desires  cha  leasc-sansleiviey 
property.  In  this  case,  his  first  order  sansiclviey 
function  I^(a*) ,  as  given  by  (4) ,  does  not  vanish. 
However,  it  can  ba  shown  chac  I^Ca*)  is  invariant  under 
Che  choice  of  Q(a*)  which  takas  its  values  in  cha  con¬ 
straint  defined  by  (2).  Hance,  in  this  problem,  for 
the  senslclvicv  analysis,  one  still  has  co  consider 

I2(  *>• 

Using  the  objective  function  (23)  wlch  v»l/2,  ie 
can  be  shown  chac  che  desired  equilibrium  point  of  the 
leader  in  terms  of  che  parameters  of  che  problem  is 
given  as  follows,  under  certain  not  totally  restrictive 
conditions: 

c  c* 

(d  +6)  +  *  (d  -b6)2-8(H-— +— )c,d, 

0  o  c,  c.  1  1 

x- - - - i - 1  (30a) 

0  3 

ua*T*c) 

1  1 

y*-c^  (30b) 

C* 

(x*+l)-l.  (30c) 

1 

The  class  of  affine  scrategies  which  yields  x*.  y*. 
and  z*  when  che  uncertain  parameter  c^  cakes  the  nominal 
value  c*,  is  represented  by 

V:(x)  -  [x**q1(z-i’*),  y*  +  q2(z-z*)l’  (31) 

where  q,  represents  che  coefficient  which  indicates  how 

PI  would  modify  x  if  zi*  z* ,  and  similarly,  q.,  relates 
the  change  in  z  co  y.  When  the  constraint  t2)  is  com¬ 
puted  for  this  oroolem,  it  turns  ouc  that  it  forces  q^ 
and  q-i  to  lie  in  che  linear  manifold  defined  by 
d  x*-x*v*  +  d.c,  , 

o:> 

Mow,  any  pair  (q.  ,q.,)  satisfying  (32)  and  (13) 
would  induce  ?2  to  produce  the  exount  z*  when  che  uncer¬ 
tain  parameter  c^  takes  the  nominal  value  Cj.  From 
this  set  ^(n*) ,  the  pair(q?,q")  which  minimizes  che 
3er.siclvicy  function  of  the  leader  is  computed  using 
(20),  and  i3  giver,  by 


which  indicates  chac,  in  a  lease  sensitive  (robust)  in¬ 
centive  scheme,  che  leeder  should  allocate  che  incentive 
among  cha  goods  ha  produces  according  co  their  respec¬ 
tive  costs  of  production. 

VII.  Conclusion 

In  this  paper,  we  have  introduced  a  minimum  sen¬ 
sitivity  approach  towards  che  solution  of  deterministic 
incentive  design  problems,  which  leads  co  robust  incen¬ 
tive  policies  that  ere  least  sensitive  to  variations  in 
the  value  of  a  parameter  (from  a  nominal)  characterizing 
che  follower's  cost  function.  Even  though  this  approach 
has  been  developed  wlch  respect  co  a  single  parameter, 
it  is  possible  co  envision  natural  (conceptual)  exten¬ 
sions  co  cha  multi-parameter  alcuation  in  which  case 
che  second  order  senaltlvity  function  t2(<**)  will  be 
defined  ee  an  appropriate  norm  of  che  matrix  whose  ele¬ 
ments  are  the  total  derivatives  of  Ji(Yi(va) ,va)  wlch 
respect  co  che  components  of  the  vector  £.~  Yet  ocher 
possible  extensions  of  this  minimum  sensitivity  approach 
are  co  the  class  of  problems  in  which  che  leader  has 
pardal  dynamic  information  (a a  in  [J],  (9])  and  co  the 
class  of  stochastic  incentive  problems  discussed  in 
{11].  Such  extensions  are  currently  under  scudy,  and 
will  ba  reported  in  forthcoming  papers. 
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Abstract ■  In  this  paper  we  analyze  a  class  of  two-agent  team  decision  problems 
with  a  hierarchical  decision  structure,  wherein  one  of  the  decision  makers  may 
have  a  slightly  different  perception  of  the  overall  team  goal,  with  this  slight 
variation  not  known  by  the  ocher  agent  who  is  assumed  to  occupy  the  hierarchically 
dominant  position.  The  leading  agent  has  access  to  dynamic  information  and  his 
role  is  to  announce  such  a  policy  (incentive  scheme)  which  would  lead  to  achievement 
of  the  overall  team  goal,  in  spite  of  the  slight  variations  in  Che  ocher  agent’s 
perception  of  chat  goal,  which  are  not  known  or  predictable  by  him.  We  may 
call  a  policy  with  such  an  additional  feature  a  robust  incentive  policy.  We 
obtain,  in  the  paper,  robust  policies  for  Che  leading  agent,  for  a  general 
cost  functional  with  convex  structure,  which  are  least  sensitive  to  variations 
in  the  following  agent's  perception  of  Che  team  goal.  In  some  special  cases, 
we  show  chat  che  robust  feature  of  the  incentive  scheme  is  maintained  regardless 
of  che  magnitude  and  nature  of  the  variations,  and  illustrate  che  theory  ultn 
two  applications  examples  arising  in  microeconomics  and  armament  limitation 
and  control. 

Rewords.  Economics;  game  theory;  incentives,  nulticriterla  decision  problems; 
robustness  of  decision  policies. 


I.  INTRODUCTION  seek  not  to  be  affected  by  this  discrepancy, 

if  this  is  at  all  possible. 

The  main  characteristic  of  ceam  decision 

problems  is  che  presence  of  several  decision  We  will  approach  this  problem  using  optimum 

makers  with  a  common  goal,  this  common  goal  incentive  design  schemes  [Ho.  Luh  and 

being  quantified  in  a  common  objective  func-  Olsder  (1982)  and  Zheng  and  Basar  (1981)1, 

clonal  which  is  to  be  optimized  jointly  (but  which  involve  a  hierarchy  in  decision  making 

possibly  in  a  decentralized  fashion)  by  all  and  a  suicable  Information  structure  for  che 

decision  makers.  An  underlying  stipulation  decision  maker  at  che  Cop  of  che  hierarchy, 

in  research  on  team  theory  has  been  che  that  allows  him  to  design  a  policy  which  in 

assumption  chac  all  agents  perceive  the  its  turn  induces  the  ocher  decision  maker 

common  goal  in  exactly  the  sane  way,  and  with  a  different  objective  functional  to 

face  exactly  the  same  mathematical  opclmiza-  behave  ir.  a  desired  manner.  Recently  in 

cion  problem  (Marschak  and  Radner  (1972)].  [Cansever  and  Basar  (1982)],  optimal  incen- 

In  this  paper  we  relax  this  basic  assumption  five  schemes  have  been  used,  within  the 

and  allow  (in  the  concext  of  two-agent  prob-  context  of  Stackelberg  games,  to  minimize 

lems)  one  agent  to  have  a  somewhat  different  che  effect  of  changes  in  the  parameters  of 

perception  of  che  common  goal  and  to  quancifv  the  follower's  cost  functional  on  the  leader 

it  in  a  slightly  different  way.  Furthermore,  optimum  cost  value,  bv  slnuitaneouslv  ach'.ev 

we  will  assume  that  che  ocher  agent  is  r.oc  ing  a  desired  goal.  Here,  we  direct  cur 

informed  of  che  exister.es  of  this  discrepancy  attention  to  problems  wnich  are  r.orainallv 
in  che  perception  of  che  common  goal,  but  is  ceam,  and  derive  incentive  schemes  that  are 

able  to  monitor  che  decision  of  the  former  least  sensitive  to  deviations  if  che  hierar- 

bv  occupying  a  higher  (dominant)  position  in  chically  inferior  cecisun  raker’s  perieo- 

the  decision  process.  The  problem  we  address  cions  of  the  uncertain  parameters.  The  fact 

to  is  che  design  of  a  suitable  strategy  for  chat  the  ur.ierl'.-ing  goal  as  o pn-.cn  ichac  is, 

che  agent  wr.o  occupies  the  hierarchically  the  nominal  ortimizacior.  prcc.em  is  a  tear 

superior  position  and  who  sciii  adopts  the  problem  -  a  proper:"  that  nav  be  cescrovod 

original  team  objective  functional  as  his  in  tne  decision  prooassi  cor.  ’re  exn  1c  . :  ,-c 

own,  suen  chac  the  change  in  che  minimum  to  obtain  verv  aooealir.g  robust  sent-.  :  .es. 

vaiue  of  che  team  cost  oecause  of  che  discreo-  as  .e  will  show  m  che  sections  : o  : . 
anev  in  the  perceptions  of  the  common  coal 

is  kepc  Co  a  minimum.  Ideally,  che  Merer-  The  ‘roolem  is  formulacec  .r.  se  clon  II.  In 

chically  superior  member  of  che  ceam  would  secc.cn  Ill  ve  iocr'd::.1  -er.s i c c.-i c"  :.r.o:i'' 
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and  obtain  robust  affiaa  scratagias  for  a 
gcnaral  class  of  convax  cost  functionals. 

In  sactlon  IV  the  results  of  tha  previous 
section  are  applied  to  a  problea  arising  In 
microeconomics.  Section  V  deals  with  the 
generalization  of  the  above  resulcs  to  the 
mulciparameter  case,  and  section  VI  illustrates 
the  basic  Idea  of  robust  policies  for  the 
Bultlparaaeter  case  using  a  model  from  arma¬ 
ment  limitation  and  control. 


II.  PROBLEM  FORMULATION 

Consider  a  two-person  deterministic  team 
decision  problea  in  normal  fora,  described  by 
the  cost  functional  J(Yj_,Y2,a),  where  Y^GI^ 
denotes  the  strategy  of  DM1  (i'th  decision 
maker)  and  aSACXis  a  parameter  on  which  the 
cost  functional  depends.  Let  u£U  ■  3ta, 
v€V  -  denote  the  decision  variables  of 
DM1  and  DM2,  respectively,  and  assume  that 
r:  •  {y:  V  -  U},  r2  *  V;  i.e.  DM1  has  access 
to  the  decision  value  of  DM2.  DM1  also  knows 
the  precise  value  of  che  paraaecer  a  (say  a°) , 
whereas  DM2  perceives  its  value  differently 
(say  a^EA) ,  which  in  turn  gives  rise  to  a 
different  cost  functional  from  his  point  of 
view,  namely  Ji^.Yj.a*)  i  J(Y1,Y2,aO). 
Furthermore,  DM1  does  not  know  che  exact 
value  perceived  by  DM2,  but  his  ultimate  goal 
is  to  see  chat  che  lowest  possible  value  is 
attained  for  •  Tha  decision 

structure  of  che  problem  is  assumed  to  be 
hierarchical,  in  tha  sense  chat  DM1  is  che 
dominant  decision  maker  and  has  Che  power  and 
means  of  declaring  his  policy  in  advance  and 
enforcing  it  on  che  other  DM.  Hence,  while 
DM2  is  faced  with  che  problem  of  minimizing 
J(Yl(v),v^+)  over  v€v.  Dill  wishes  to  choose 
a  Y1e:1  (in  total  ignorance  of  o+)  that  would 
eventually  lead  to  a  minimum  value  for 
J(Y].(v)  ,v,a°) . 

By  an  abuse  of  notation,  let  J(u,v,a)  denote 
the  cosc  functional  on  the  product  space  UxV, 
for  each  a€A,  and  assume  chat  this  functional 
is  strictly  convex  on  UxV,  for  each  a£A,  is 
twice  continuously  differentiable  in  its  first 
two  arguments  and  continuously  differentiable 
in  its  chird  argument.  Furthermore,  let  us 
denote  che  unique  minimum  of  J(u,v,a°)  by 
(ue,ve)SUxV.  Restricting  DM1  to  affine  pol¬ 
icies  in  :L,  we  first  note  chat  che  policy 

Yr(v)  -  uC  +  ?  (v-vC)  ,  (1) 

where  ?  is  an  (nxm) -matrix,  has  che  appealing 
property  thee  if  0M2's  perception  of  a  is  a0, 
then  min  J('. (v) ,v,a°)  leads  to  the  desired 
v€V 

value  vt€V  for  any  matrix  P.  If  a  ^  a0, 
however,  che  problem  ceases  co  be  a  cooperacive 
one  since  che  problea  faced  by  DM2  is 

min  J(v  .  (v)  , /.a*)  (2a) 

■j€V  1 

whose  minimizing  solucion  fsav  v  EV)  satisfies 
(and  is  uniquely  decernir.ed  by)  the  equacion 


J(u+,v+,a+)  P  ♦  !  J(u+,v'*’,a*) 


Cb) 


whera  u  »  Y^(v  )  »  uC  P  (v  -vC)  and  is  not 
necessarily  che  seme  es  uc .  The  problem  we 
address  co.  In  the  sequel,  is  whether  lc  is 
possible  co  choose  a  robust  policy  Yj  (by 
choosing  P  appropriately)  so  thee  either 
u^-u*  and  v+“vc,  or  the  discrepancies  will 
be  small  whenever  a*  is  close  co  a°-  in  other 
words,  ve  seek  either  total  insensitivity  or 
minimum  sensitivity  of  the  optimum  value  of 
J(u,v,n°)  to  variation*  in  the  perception  of 
DM2  (of  a)  by  a  proper  choice  of  y^. 


III.  INTRODUCTION  OF  A  SENSITIVITY 
FUNCTION  AND  DERIVATION  OF 
ROBUST  SOLUTIONS 

As  a  measura  of  the  sensitivity  of  J(u,v,a°) 
with  respect  co  deviations  of  a'*'  from  les 
nominal  value  a0,  let  us  introduce  the  total 
darivative  of  J(u,v,a°)  with  respect  to  Its 
third  argument,  which  we  call  che  first  order 
sensitivity  function  of  J(u,v,a)  with  respect 
to  a: 


I1(n°)  »  dJ (uC ,vc ,a°)  / da 

-  tTuJ(uC.vt,a°)?  *  7vJ(ut,vt,s0)] 

The  first  product  term  of  the  above  expression 
vanishes  at  the  nominal  solucion  point.  How¬ 
ever,  this  is  not  necessarily  the  case  if 


In  order  co  find  an  expression  for 


4^-,  let  us  first  note  chat  the  equacion  chat 
da’ 

determines  che  optimum  response  of  DM2  co  (1), 
for  a  general  a: 


t  a> 


(i) 


Is  an  identity  for  all  a€A,  and  hence  its 
derivative  with  respect  to  a  still  vanishes 
for  all  a€A.  Such  a  consideration  readily 
leads  to 

-  ..  .  .  i 

‘It 

from  which 

at  .  ;r  -  r'  l  -  7  .•*  *  7/»JI  "  -  * 


l:l 


Now,  if  we  choose  ?  such  that 


7  J(u\vC,a0)?  -r  7  -  Q 

iu  IV 

dv 

then  vanishes  at  Che  nocirtal  soiucicn 
da 

poinc.  By  che  sane  conen.  che  n'ch  :>rder 
sensicivicy  function  n  J  vicn  rescec:  : ~ 


I  (a) 
n 


1  daJ(u.v.a) 


(12a) 


would  carry,  by  chain  rula.r—  as  a  produce 

am 

tarn;  hanca  eha  sensitivity  functions  of 
order  1,  2  and  3  vanish  at  the  nominal  solu¬ 
tion  point.  This  situation,  in  curn,  implies 
that  when  DM2's  perception  a*  stays  within  an 
c-nclghborhood  of  its  nominal  value  a°,  the 
3rd  order  Taylor  approximation  of  the  effect 
of  this  discrepancy  is  zero.  Therefore,  when 
the  class  of  matrices  P  defined  by  (6)  Is  not 
empty,  affine  strategies 

Y1  -  uC  +■  P(v-ve)  ,  PSf  (7) 

carry  very  appealing  sensitivity  properties. 

Let  us  now  assume  that  J(u,v,a)  Is  linear  in 
a;  more  precisely, 

J(u,v,a)  -  g(u,v)  +  a  h(u,v)  (3) 

where  7uh(uc,vc)  Is  not  identically  zero. 

Then  7auJ(u,v,o)  and  70vJ(u,v,a)  become 
independent  of  a,  and  it  becomes  possible  to 
choose  P  such  chat  sensitivity  functionals  of 
all  orders  vanish,  for  all  values  of  a£, A. 

Such  a  choice  would  be  a  P  satisfying 

7uh(ue,ve)P  +  7vh(ut,vt)  -  0.  (9) 

which  always  exists  since  7uh(uC,v£)  t  0.  In 
this  case,  DM1  can  induce  DM2  to  choose 
Y2  ”  vc,  independent  of  his  perception  of  a, 
that  is  even  if  a+  i*  a0. 

Some  more  Insight  can  be  gained  into  this 
result  by  caking  a  slightly  different 
approach.  Consider  the  team  decision  prob¬ 
lem  described  by  the  cost  functional 

.*..»*)  «  ,  »•  *  a,  (10) 

where  both  Vi  and  v2  belong  to  V.  Let  g(u,vj) 
and  h(u,v2)  both  be  strictly  convex  on  UxV 
and  further  assume  that 

Ku.v^.Vj.a)  ■  g (u , v^)  +  ah(u,v2) 

is  strictly  convex  on  UxVxV  for  each  a€A. 

We  now  view  the  team  decision  problem  (10)  as 
a  Stackelberg  game  where  DM1,  Che  leader,  with 
cost  functional  J(u,v, ,v2,a°) ,  faces  two 
hypothetical  followers  DM*  and  DM”  with  cost 
functionals  g(u.v^)  and  h(u,v2),  respectively. 
The  problem  faced  by  DM1  is  to  devise  a 
scrategy  which  would  Induce  DM®  and  DM*1  to 
play  vj^  ■  vE,  v2-vc,  simultaneously,  where  vc 
minimizes,  together  with  u£,  the  function 
J(u,v,a°)  given  by  (3).  Naturally,  when 
Vi  -  V,  -  vc,  the  realized  value  of  Vj,  should 
be  uc  Twhich  is  the  side  condition);  where 
Che  pair  (ut,vc,vt)  jointly  minimizes 
J(u,vj ,v, ,a°)  over  UxVxV.  Let  us  assume  thac 
DM1  observes  the  decisions  of  DM®  and  DM*1. 

Let  us  further  assume  thac  DM1  adopts  an 
affine  strategy  of  the  form 

Yj^’^.v,)  »  uC  +•  ?1(v1-vC)  +  P^v^v1),  (11) 

where  P.  and  are  nxa  matrices  sacisfving 

i  2  * 


7ug(ut.vC)P1  -  0 

and 

7uh(uC,vC)P2  +  7v  h(u£,vC)  -  0  .  (12b) 

Mote  thac  because  of  strict  convexity  of 
g(u,v2)  and  h(u,v2),  these  are  the  necessary 
and  sufficient  conditions  for  vj  •  vf  and 
v2  *  vc  to  minimize  g(u,v2)  and  h(u,v2), 
respectively,  when  u  is  given  by  (11). 

Sow,  since  (uLv6)  minimizes  (8)  when  a  »  a0, 
ve  first  have 

7|Jg(ut,vE)  -  -  a°7uh(ut,vt)  (13a) 

7vg(uC,vC)  -  -  a°7vh(ut.vt)  (13b) 

and  substituting  these  into  (12a)  we  obtain 

-o°(7uh(uC,vt)P1  +  7vh(ut,vt)]  -  0  (14) 

which,  when  compared  with  (12b),  leads  to  the 
conclusion  thac  P^  satisfies  the  same  equation 
as  P2  does,  which  is  precisely  (9).  Further¬ 
more,  a  solution  always  exists  since 
7uh(ut,vc)  f  0,  which  wa3  one  of  our  hypoth¬ 
eses  in  the  formulacion. 

Sow,  let  us  suppose  that  one  single  player, 
say  DM2,  chooses  a  linear  combination  of 
g(u,v)  and  h(u,v)  as  his  cost  functional, 
with  decision  space  7.  More  precisely,  let 

J2(u,v)  -  kjgfu.v)  +  k2h(u,v) 

where  k,  ,  k^B  are  such  that  J,(u,v)  is 
strictly  convex  on  UxV.  From  ciie  previous 
discussion,  it  follows  thac  if  DM1  announces 
a  strategy  of  the  form 

Yx(v)  -  uC  *  P(v-vC) 

where  P  satisfies  (9)  or  (12a),  DM2  is  induc¬ 
ed  to  choose  v  ■  vc,  independent  of  and  k2, 
as  long  as  J-,(u,v)  remains  strictly  convex 
on  L'xV,  whicn  establishes  the  desired  result. 
It  is  interesting  to  observe  that  using 
sensitivity  analysis,  the  opciaal  P  was  found 
to  satisfy 

7  h(ue,ve)  P  +  7vh(ut,vt)  -  0  , 

whereas  the  previous  discussion  shows  that  ? 
also  satisfies 

7ug(ut,vt)  ?  *  7vg(uC,vC)  -  0  , 

as  long  as  h(u,v)  and  g(u,v;  are  both  strictly 
convex  on  L’xV. 


IV.  AX  EXAMPLE  FROM  MICROECONOMICS 

In  this  section  we  discuss  an  exanole  iron 
microeconomics .  which  illustrates  so.te  of 
the  results  obtained  in  the  previous  sections 
Let  us  consider  a  duopoliscic  market  consist¬ 
ing  of  two  firms  DM1  and  DM2,  who  produce  the 
goods  X  and  Yf  respectively.  Duopolistic 
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microeconomic  models  have  recently  been 
created  in  [Cansever  and  Basar  (1932)]  and 
[Siaaan  and  Cruz  (1973)]  within  the  context 
of  optimum  incentive  design  problems;  in  the 
present  example  we  discuss  a  more  general 
model.  Let  us  assume  chat  these  goods  are 
substicucable  and  are  sold  at  the  same  market 
price  p  which  is  determined  by  an  inverted 
demand  function  f(x+y).  Here  and  y€R_ 

represent  the  quantities  to  be  produced  from 
each  good  X  and  Y,  respectively.  Let  Che 
inverted  demand  function  f(-)  be  continuously 
differentiable  with  the  properties 

<  0  ,  f(-)  -  0;  f(0)  -  -  , 

where 

z  «  x  «■  y  .  (15) 

Let  the  cost  functionals  of  Che  firms  be 
given  by  c,(x)  and  c.Cy)  where  c1(*)  maps 
into  E+  (1-1,2) ,  and  is  assumed  to  be  scricc- 
ly  increasing  and  continuously  differentiable. 
The  profic  functions  of  the  firms,  which  are 
to  be  maximized,  become 

^(x.y)  -  f(x+y).x  -  cx(x)  (16a) 

ifj^x.y)  -  f(xry).y  -  c,(x)  .  (16b) 

Lee  us  now  assume  chat  DM1  and  DM2  have 
agreed  to  collude  and  to  Jointly  maximize  a 
linear  combination  of  their  profit  functions, 
given  as 

»(x,y,a)  •  T^x.y)  +  3*2(x,y)  (17) 

where  represents  DM2's  market  share, 

with  DMl's  market  share  normalized  Co  unity. 
Let  us  furcher  assume  chac  the  firms  have 
arrived  at  a  common  acceptable  value  a  •  a 
after  some  negotiations  which  eventually  led 
to  Che  collusion  situation  described  above. 

Let  (x°,y°)GIU  x  denoce  a  unique  pair 
of  decisions  (production  levels)  chac  max¬ 
imizes  ir(x,y,ai>).  Such  a  pair  satisfies  the 
sec  of  equations 

*  ft*5-*)  -  C13a) 

•  *1«**-!’*>  •  ’  (18b) 

*  * 

Sow  let  (x  ,y  )  denote  a  unique  Nash  equilib¬ 
rium  pair  (Cournot  solucion)  for  the  non- 
cooperative  nonzero-sun  game  which  involves 
two  players  (DM1  and  DM2)  with  objective 
functionals  Vjfx.y)  and  ^(x.v),  respectively. 
The  pair  (x*,y*)  satisfies  the  relation 

V<.V>V  -  t„Vi  -  v;u*>  (19a) 

V'lV.V-ii.VMJm,*,  .  (19b) 

'  f  - 

Concerning  the  equilibrium  pairs  (x°,y°),  we 
also  assume  that  the  following  inequalities 
hold 

-1(x°,y°)>-T1(x*,y*) 


since  otherwise  there  would  not  be  any  incen¬ 
tive  for  Che  players  (at  least  for  one  of  them) 
to  participate  in  a  cooperative  agreement. 

Mow,  let  us  assume  chac  DM2  has  decided  to 
Increase  his  market  share  from  a°  co  a+, 
without  Informing  DM1,  while  DM1  still  uses 
che  value  a°  in  the  total  objective  function. 
Then,  with  an  element  of  cooperation  sclll 
present,  the  firms  would  be  faced  with  a  non- 
cooperative  game  having  objective  functionals 
s(x,y,a°)  for  DM1  and  it(x,y,n+)  for  DM2.  Let 
(x+,y’’)  denote  che  unique  equilibrium  pair 
for  this  new  game,  and  assume  chat 

+  +  *  * 

^(x  .y  )>’2(x  ,y  ) 

-f  +  *  # 

.y  )>it2(x  >y  ■ 

Since  the  pair  (x°,y°)  uniquely  maximizes 
ir(x,y,a°)  (che  objective  functional  of  Dill) 
we  clearly  have  che  inequality 

v(x  ,y  ,a  )>v(x  ,y  ,a  )  . 

Finally,  let  us  assume  that 

i)  DM1  has  enough  power  to  announce  his 
strategy  in  advance  and  enforce  it  on  DM2; 

11)  DM1  has  access  to  DM2’s  action  y, 
and  chooses  his  policy  as  an  affine  function 
of  y; 

ill)  Ulth  DMl’s  policy  chosen  as  y^fyi-x0 
+  P(v-y°) ,  the  objective  functional  ^(Y^.y.n) 
is  strictly  concave  in  y,  for  all  aGioi0, upl¬ 
and  for  a  given  P  whose  value  will  be  specified 
in  the  sequel. 

This  example  clearly  falls  within  the  scope  of 
Che  class  of  team  decision  problems  analyzed 
in  che  previous  sections.  Moreover,  since 
the  team  objective  functional  !r(x,y,n)  is 
affine  in  a,  DM1  is  able  to  induce  yJ  indepen¬ 
dent  of  DMZ’s  perception  of  n,  by  announcing 
a  strategy  of  the  fora 

^(y)  -  x°  +  P(y-y°)  (20a) 

where  P  is  given  by  (fora  (9)) 

7yf (x°+y°)y°  +  f(x°*y°)  -  7yc,(y°) 

7  f(x°+y°)y° 

( 20b ) 

(20b) ,  combined  with  (18) ,  readily  gives 
o 

P  -  — —  >  0  .  (20c) 

O  0 

a  y 

This  is  a  result  which  holds  true  for  the 
general  class  of  profit  functions  described 
in  this  example.  To  gain  some  insight  into 
this  result,  let  us  suppose  that  DM2  decides 
to  produce  a  quantity  y  >  v°;  then,  given 
the  strategy  (20)  of  Df!l,  the  total  quantity 
produced  will  be 


,  .  a  »_  ,5 
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Aisumiug  chac  Cha  profit  functions  and  sj 
arc  Identical,  and  a°“l  (meaning  equal  shares 
in  cha  market),  we  have  P=l,  and  (21)  becomas 

a  -  x°+y°  +  2(y-y°)  . 

V.  EXTENSION  TO  THE  MULTIPARAMETER 
CASE 


In  Cha  previous  sections,  we  have  restricted 
our  discussion  to  the  case  ,  When  a£jRr, 
the  first  order  sensitivity  function  I^(a°) 
becomes  a  (1  x  r)  vector  given  by 


J^(o°) 

where 


dJ(ue,v\a°)/da  -  (VuJ„P+7vJ)(||) 

(22) 


(4^)'  *  [°  J.P+7  JHP'7  J.p+p't  J 

da  au  av  uu  vu 

♦W-Wwj>“1  <23) 

o  z  c 
with  a»a  ,  u»u  ,  v— v  . 


Now,  if 

73vJ(ut.vt,<‘0)  c  8  (73UJ(uC,vC,n0))  (24) 

(where  R  denoces  the  range)  Chen  it  is  pos¬ 
sible  to  choose  a  ?  such  chat  •  0  at  the 
nominal  solution  polnc.  In  this  case,  sen¬ 
sitivity  functions  of  orders  1,2  and  3  vanish 
at  the  nominal  solution  poinc;  hence,  affine 
strategies  have  very  appealing  sensitivity 
properties  in  the  mulciparaaeter  case,  too. 
When  condition  (24)  is  not  satisfied,  one 
has  to  minimize  a  suitable  non  of  the  lead¬ 
ing  sensitivity  function  with  respect  to  the 
(nxm)  matrix  ?.  Since  I^(o°)  vanishes  at  Che 
nominal  solution  polnc,  one  has  to  consider 
the  second-order  sensitivity  functional  Wo0) 
which  is  an  (rxr)  nonnegative  definice  matrix. 
A  suitable  norm  for  minimization  is,  in  this 
case,  Trdofa0);.  We  are  now  faced  with  an 
unconstrained  optimization  problem  on  ?,  for 
which  a  closed-form  solution  does  not  in 
general  exist,  however,  numerically  it  is  a 
feasible  problem.  We  will  illustrate  some 
of  these  ideas,  in  the  next  section,  by 
solving  an  example  that  involves  arms  race 
between  two  nations. 

VI.  AN  EXAMPLE  FROM  THE  PROBLEM 
OF  ARMAMENT  LIMITATION  AND 
CONTROL 

In  thei-  papers  on  armament  race  and  control 
(Slnaar.  and  Cruz  (1973a)  and  Simaan  and  Cruz 
(1975b) I,  Simaan  and  Cruz  have  modeled  the 
arms  race  proslex  as  a  noncooperacive 
differential  game  between  two  nacions.  A 
sa.ienc  feature  of  this  model  is  that,  when 
the  respective  cost  functionals  are  taken  to 
be  quacracic  in  the  decision  variables,  the 
resulting  opcimal  state  trajectory  yeilds  a 
discretized  version  of  the  armament  sodei 
proposed  earlier  bv  Richardson  [Richardson 
(i960)’.  We  will  ;or.sider  here  the  case 
vnen  Che  two  nations.  1M1  and  CM2,  have 
agreea  to  reduce  their  resteccive  armament 


expenditures.  Such  a  situation  inevitably 
requires  the  presence  of  an  element  of 
cooperation  between  DM1  and  DM2,  since  any 
significant  departure  from  the  armament  level 
jointly  agreed  upon  may  eventually  lead  to 
the  original  high  aroamant  expenditure. 

Towards  the  formulacloo  of  this  problem, 
lac  us  assume  Chat  the  goals  of  DMs  can  be 
represenced  by  two  objective  functionals 
Jj^x^.xj.u^.un)  ,  i"l,2,  wherein  DMi  aims 
to  optimize  jJ.  In  order  to  incorporate  the 
cooperation  element  discussed  above,  we  will 
adopt  the  Pareto  opcimal  equilibrium  concept, 
which  will  be  realized  [Schmltendorf  and 
Leitmann  (1974)]  if  the  DMs  jointly  optimize 

'  bvVWVS5  *  kjd(*.,*j,»l,i»2)  (25) 

where  k^JL*.;  uS3l+  and  u->€n+  denoce  DMI  and 
DM2’s  armament  investments,  respectively,  and 
x^  represents  the  armament  level  of  DMI, 
i-1,2,  which  further  satisfies 

xi  ‘  *i(*lo*“l)  ’  W'2  ’  (26) 

where  f i :  *+  x  K+  -»  IU  is  a  continuous 
function  of  x^0  and  u2,  and  is  strictly  in¬ 
creasing  in  its  second  argument.  Here,  xiQ 
denoces  the  initial  arament  level  of  DMi.'' 


In  order  to  obtain  some  explicit  results, 
let  us  adopc  the  quadratic  objective  function¬ 
al  model  proposed  by  Simaan  and  Cruz  [Siaaar. 
and  Cruz  (1975a)  and  Simaan  and  Cruz  (1975b)], 
because  of  its  analytical  traceability  and 
other  appealing  features  In  relacion  with 
other  existing  models;  namely,  let 


*1*  . 


and 


*  '  si*i 


>1,1:  j-l.i; 


(28) 


where 

s  >o.  q?>o.  s.wj, 

1  •"  1 


Here,  denoces  che  given  initial  anar.er.t 
Level  oc  DMi  and  equation  (27)  reveals  chac 
each  DM  wanes  to  reduce  che  gap  cr.ac  exists 
between  his  araamenc  level  and  a  linear 
function  of  che  ocher  DM’s  araaner.t  level, 
and  ac  che  sane  cine  co  minimize  his 
expenditure.  We  refer  co  [Sinaan  and  Cruz 
(1975b)]  for  an  elaborated  interpretation 
of  (27).  Under  this  set-up,  there  exists  a 
unique  pair  (u?,u?),  tniniaizing  JfXi  ,x-»  ,u,  ) 

as  a  function  of  J^2  1  o  which  corresponds 

t  c  kl 

to  the  pair  (u  ,v  )  in  che  general  disc^=sion 
of  sections  II  and  V. 


As  it  nay  be  che  case,  one  of  che  DMs,  say 
DM2,  nay  deviate  rren  u$.  The  reason  behind 
such  a  sove  aav  be  chac  DM2  tccailv  ignores 
Che  cocperacion,  and  minimizes  his  own 
objective  functional.  Assuming  tr.at  eicn 
DM  can  non l tor  che  zecisions  o:  m* 
adversary,  this  sicuacior  voulc  inr.eci.At - 1 
give  rise  to  a  Nash  equilibrium  vim  h ;  . 
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a  numeric  expenditures.  Since  w«  have  assumed 
chat  each  DM  desires  co  reduce  his  expendi¬ 
tures  while  maintaining  a  certain  balance  of 
powers,  such  a  unilateral  and  large  devia¬ 
tion  will  be  unlikely.  In  its  stead,  we 
will  assume  that  DM2  may  have  an  incentive 
co  perform  a  relatively  small  deviation  from 
the  Pareto  equilibrium  point,  being  motivated 
by  one  of  the  following  three  considerations: 

I)  DM2  may  decide  to  promote  his  reladve 
Importance  in  the  agreement,  which  Is  re¬ 
flected  by  an  increase  of  the  value  of  a  from 
o°  to  a+,  without  Informing  DM1,  while  DM1 
still  uses  the  value  n°  In  his  objective 
functional; 

II)  DM2  may  develop  a  different  perception 
of  the  values  of  one  or  more  coefficients  in 
the  team  objective  functional  wlchouc  inform¬ 
ing  DH1.  Lee  us  assume,  for  instance,  that 
DM2  has  decided  to  place  higher  priority 

and  emphasis  to  reducing  the  gap  between 
his  armament  level  and  Che  linear  functional 
of  DM1 's  armament  level  chan  co  minimizing 
his  expenditure;  more  precisely,  chat  he  has 
decided  to  increase  the  value  of  Qj  from  09 
to  qJ. 

ill)  Both  (I)  and  (11)  may  be  present. 


We  now  analyze  these  three  cases  separately. 

C^3a  !i,.  This  is  similar  co  Che  analysis 
of  section  IV.  The  optimal  strategy  for  DM1, 
which  leads  to  (uJ.uS)  as  final  outcome. 
Independent  of  posslSle  deviations  in  DM2's 
perception  of  a°,  is  given  by 


T,(1)(U,) 


?(1)  (u,-u') 


(29) 


where 

p(i)  V^Moq^-VVio+u^-v,)) 

*  Rj  ( uj -*»,)+<} 2  <V10-up-v2) 


(10) 


~aaa  (ii, .  This  situation  corresponds  co 
objective  functionals  affine  in  the  uncertain 
parameter  (cf.  eq.  (3)),  in  unlch  case  the 
analysis  of  section  III  prevails.  Hence, 
there  exists  an  optimum  robust  strategy 
realizing  the  team  solution  independent  of 
0M2's  different  perceptions  of  O',  and  such 
a  scracegy  is  given  by 

(ii),  ,  c  1  ,  t,  .... 

y.  (u.)  -  u,  -  (u--u,)  .  (3i) 

i  i  3’J1  “  ‘ 

Jlicz  iii  .  Here,  condition  (21)  is  not 
satisfied.  Hence,  within  the  class  of  affine 
policies,  there  does  not  exist  any  elemenc 
which  mates  the  cost  thac  3M1  Incurs  complete¬ 
ly  insensitive  to  discrepancies  in  3M2's 
perceptions  in  more  than  one  parameter.  In 
order  to  overcome  this  difficult'/,  we  adopt, 
as  in  section  V.  the  scalartced  sensitivity 
function  Tr  I.,  l  a . i  ,  and  minimice  it  suc- 
to  the  constraint  chat  tne  scrateg  ■  of 
DM1  satisfies 

ill')  "  iii; 

v,  '  <u„;  *  u.  *  0  (u.-o.;  .  (22) 


This  problem  can  be  shown  to  admit  a  unique 
solution  which  can  be  obtained  explicitly. 
Hence,  when  DM1  is  uncertain  about  DM2’s 
perception  of  both  a  and  Qj,  there  still 
exists  an  affine  scracegy  which  minimizes  an 
appropriate  scalar  function  representing  the 
sensitivity  of  DM1 's  incurred  cost  with 
respect  to  deviations  in  these  coefficients 
from  their  nominal  values,  and  such  a 
strategy  Is  given  by 


(ill) 


(u7) 


C  ^  Q(iii) r 

+  P  /(u2-u2). 


In  the  preceding  analysis,  P^  ^  is  Che  same 
coefficient  as  DM1  would  have  used  in  his 
strategy  in  a  Stackelberg  game  with  DM2  being 
the  follower  and  DM1  enforcing  the  point 
(uJ.Uj).  On  the  other  hand.  In  case  (ii),  by 
announcing  a  strategy  of  the  fora  (31),  DM1 
makes  DM2's  objective  functional  independent 
of  the  uncertain  coefficient  Q2-  Therefore, 
DM2’ s  discrepancies  do  not  affect  the  team 
solution  anymore.  However,  when  the  number 
of  uncertain  coefficients  19  large  as  compared 
with  the  dimension  of  DMs ’  decision  vectors 
there  still  exists  a  compromise,  which  is  to 
minimize  the  cumulative  effect  of  variations 
of  uncertain. parameters  around  their  nominal 
value:  yvlIi'(u„)  is  designed  to  perform  such 
a  compromise. 
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■ABSTRACT 

In  this  paper  we  analyze  a  class  of  two-agent  team 
decision  problems  with  a  hierarchical  decision  struc¬ 
ture,  wherein  one  of  the  decision  makers  may  have  a 
slightly  different  perception  of  the  overall  team  goal, 
with  this  slight  variation  not  known  by  the  other  agent 
wno  is  assumed  to  occupy  the  hierarchically  dominant 
position.  The  leading  agent  has  access  to  dynamic  in¬ 
formation  and  his  role  is  to  announce  such  a  poLicv 
incentive  scheme >  which  would  lead  to  achievement  of 
the  overall  team  goal,  in  spite  of  the  slight  varia¬ 
tions  in  the  ocher  agent's  perception  of  that  goal, 
whic;;  are  not  Known  or  predictable  by  nim.  We  may  call 
a  policy  with  such  an  additional  feature  a  "minimum 
sensitivity"  incentive  policy.  We  obtain,  in  the  paper, 
"minimum  sensitivitv"  policies  for  the  leading  agent, 
tor  a  general  cost  functional  with  convex  structure, 
■-•hi.cn  ire  least  sensitive  to  variations  in  the  following 
igent’s  oerception  of  the  team  goal.  In  some  special 
:ases,  we  snow  that  the  robust  feature  of  the  incen- 
:  ve  j creme  .s  maintained  re  cart  less  •:  :ne  tugniC.ide 
ana  nature  of  the  variations,  and  illustrate  the  tneory 
with  an  example  arising  in  armament  limitation  and 
jontrnl. 

'NTRODtCTI^N 


The  main  characterist ic  of  team  decision  problems 
is  tne  presence  of  several  decision  makers  with  a 
common  abjective  functional  which  is  to  be  optimized 
jointly  but  possibly  in  a  decentralized  fashion)  by 
ail  decision  makers.  An  underlying  stipulation  in 
research  on  team  theory  has  been  the  assumocion  that 
all  agents  perceive  the  common  goal  in  exactly  the  same 
vay ,  and  face  exactly  the  same  mathematical  optimiza¬ 
tion  problem  (Marschak  and  Radner  (1972)].  In  this 
paper  we  relax  this  basic  assumption  and  allow  fin  the 
:ontext  of  two- agent  problems)  one  agent  to  have  a 
somewhat  different  perception  of  the  common  goal  and 
:o  quantify  it  in  a  slightly  different  wav.  Further¬ 
more,  we  will  assume  that  the  other  agent  is  not  in¬ 
formed  of  the  existence  of  this  discrepancy  in  the 
perception  of  the  common  goal,  but  is  abl2  to  monitor 
the  decision  of  the  former  by  occupying  a  higher 
('dominant)  position  in  the  decision  process.  The 
problem  we  address  to  is  the  design  of  a  suitable  stra¬ 
tegy  for  the  agent  who  occupies  the  hierarchically 
superior  position  and  who  still  adopts  the  original 
team  objective  functional  as  his  own,  such  chat  the 
change  in  the  minimum  value  of  the  team  cost  because  of 
the  discrepancy  in  the  perceptions  of  the  common  goal 
is  xept  to  a  minimum.  Ideally,  the  hierarchically 
superior  member  of  the  team  would  seek,  not  to  be 
iffected  by  this  discrepancy,  if  this  is  at  all 
oossib le . 

V»’e  will  looroach  this  problem  using  optimum  incen¬ 
tive  design  scr.emes  [Ho,  Lun,  and  OLsder  (1982)  and 
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Zheng  and  3a§ar  (1982)],  which  involve  a  hierarchy  in 
decision  making  and  a  suitable  information  structure 
for  the  decision  maker  at  the  top  of  the  hierarchy, 
that  allows  him  to  design  a  policy  which  in  its  turn 
induces  the  other  decision  maker  with  a  different  ob¬ 
jective  functional  to  behave  in  a  desired  manner. 
Recently  in  (Cansever  and  Ba^ar  (1982)],  optimal  incen¬ 
tive  schemes  have  been  used,  within  the  context  of 
Stackelberg  games,  to  minimize  the  effect  of  changes 
in  the  parameters  of  the  follower's  cost  functional  on 
the  leader's  optimum  cost  value,  by  simultaneously 
achieving  a  desired  goal.  Here,  we  direct  our  atten¬ 
tion  to  problems  which  are  nominally  team,  and  derive 
incentive  schemes  that  are  least  sensitive  to  devia¬ 
tions  in  c he  hierarchically  inferior  decision  maker’s 
perceptions  of  the  uncertain  parameters.  The  fact  that 
the  underlying  goal  is  common  (that  is,  the  nominal 
optimization  problem  is  a  team  problem— i  property  tnat 
may  be  destroyed  in  the  decision  process)  tan  be  ex¬ 
ploited  to  obtain  very  appealing  minimum  sensitivity 
strategies,  as  we  will  show  in  the  sections  to  follow. 

The  problem  is  formulated  in  Section  II.  In 
Section  III  we  introduce  sensitivitv  functions  anc 
obtain  robust  affine  strategies  for  a  general  class  of 
convex  cost  functionals.  In  Section  IV  we  orovide  a 
geometrical  interpretation  'or  total  Ir  sens*.:  tv  if  vr.-r 
tne  objective  functional  :s  affine  m  tne  unknown 
parameter.  Section  7  deals  with  the  generalization  ?f 
some  of  these  results  to  the  mulcioarameter  case,  and 
Section  VI  illustrates  the  basic  ideas  developed  in 
this  paper  using  a  model  from  armament  limitation  and 
control.  Concluding  remarks  of  Section  VII  end  the 
paoer . 

II.  PROBLEM  FORMULATION 

Consider  a  two-person  determinisc ic  team  decision 
problem  in  normal  form,  described  by  the  cost  func¬ 
tional  J(  •  •  *  where  •  7  -  denotes  the  strategy  of 

DMi  li'th  iecisicn  maker)  and  i£ACR  is  a  parameter 
on  which  the  cost  functional  depends.  Let  u£U*Ra, 
v£V  * denote  the  decision  variables  of  DMI  and  DM2, 
respectively,  and  assume  that  *  :  V-U  • ,  7-*  »7;  i.e. 

DMI  has  access  to  the  decision  value  of  DM2.  DM1  also 
knows  the  precise  value  of  the  parameter  x  < sav  1°) , 
whereas  DM2  perceives  its  value  differently  sav  Al, 
which  in  turn  gives  rise  to  a  different  cost  functional 
from  his  point  of  view,  namelv,  J  (•  ^ .  *'■  *  i 
Furthermore,  DM1  does  not  know  the  exact  value  per¬ 
ceived  by  DM2,  but  his  ultimate  goai  is  to  see  that  the 
lowest  possible  value  is  attained  for  J(  .  The 

decision  structure  of  the  problem  is  assumed'to  •'e 
hierarchical,  in  the  sense  that  DM l  is  tne  dominant 
decision  maker  and  has  the  oower  and  means  ->f  iec  lar 
his  policy  in  advance  and  enforcing  it  on  the  other  DM. 
Hence,  while  DM2  is  faced  with  the  nroolem  at  minimi¬ 
zing  J(  t(v»,v,-i  )  u*er  v€7,  DM1  wishes  :  ■*  choose  , 

•  ,  €  7,  (in  total  ignorance  ?f  tnat  would  event  ;.il  Iv 
lead  to  a  minimum  value  for  h'*jiv),v,  L 

3v  in  abuse  of  nocation,  let  Jiu,v,»;  denote  tne 
cost  functional  in  the  product  sc  ice  1*  -  V .  f  t  each 
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i£A,  and  assume  that  this  functional  is  strictly  con¬ 
vex  on  -*V,  for  each  i€.A,  is  twice  continuously  dif¬ 
ferentiable  in  its  first  two  arguments  and  continuously 
iif ferentiable  in  its  third  argument.  Furthermore,  let 
as  denote  the  unique  minimum  of  J(u,v,aJ)  by 
',uc  ,vc  'S  V'V.  Restricting  DMl  to  affine  policies  in 
,  we  first  note  that  the  policy 

• .  (v)  »  uc-p(v-vc)  !  ,  (L) 


where  the  arguments  are  evaluated  at  u»u\  v«v*. 

;  3  .  Mote  that  the  required  inverse  in  (5)  exists 

under  the  initial  hypothesis  that  J  is  strictly  convex 
in  (u,v)  for  all  ;6  A. 

'low,  since  the  pair  (uC,vC)  globally  minimizes 
J(u,v,a°),  we  already  know  that 


J  (uC » vC , j°)  -  0, 


J  (ut,vt,j°)  -  0, 


where  ?  is  an  (n^m) -matrix*  has  the  appealing  pro¬ 
perty  that  if  DM2 ' s  perception  of  a  is  a°,  then 
min  J(-^v),v,u°)  leads  to  the  desired  value  vc  C  V  for 
v€V  +.  0 

anv  matrix  ?.  If  a  *  a  ,  however,  the  problem  ceases 
to  oe  a  cooperative  one  since  the  problem  faced  by  DM2 


in  view  of  which  the  first  product  term  of  (3)  and 
hence  I^(u°)  vanish.  Then,  the  dominating  term  in  the 
Taylor  expansion  of  v+*  a0)  around  i°  is  deter¬ 

mined  by  the  second-order  sensitivity  function: 

2 

-  d:j(u+,v+.i°)/di*  +  o 


min  J(y(v),v,a)  (2a) 

v€V  1 

whose  minimizing  solution  (say  v+£  V)  satisfies  (and  is 
uniquely  determined  by)  the  equation 

-  (u*.v*.i*)P  + J  (uT,v+a+)  -  0  Cb)U 

J  v 

where  a** »  -  vv1!  *  uc  ♦  P  (v+-v-)  and  is  not  necessarily 

the  same  as  uc.  The  problem  we  address,  in  the  sequel, 
is  whether  it  is  possible  to  choose  a  robust  policy 
(by  choosing  ?  appropriately)  so  that  either  u+*uc  and 
vc,  or  the  discrepancies  will  be  small  whenever  1* 
is  close  to  i°;  in  other  words,  we  seek  either  total 
insensitivity  or  minimum  sensitivity  of  the  optimum 
value  of  -T(u,v,^°)  to  variations  in  the  perception  of 
DM2  (of  a)  by  a  proper  choice  of  * . 


[  (dv"f/da+)'  (d^J/dv+  )  (dv+/da+) 


+  (dj/dv+)  (d~v+/da+  )] 

+■  0 

1  u  3U 

-vc  [P'J  P+P'J  +J  F+J  ]vC+rj  P-J  ]v:  .  (?) 

1  UU  vu  UV  W  x  U  V  iU 

Since  the  second  term  is  zero,  in  view  of  (6),  I-*(aJ') 
vanishes  if  and  only  if  v^*Q,  which  requires  from  '.5) 
that  there  exist  a  P  satisfying 

■’  J  (u\vC.»°)P  +  J  (uhv'.y5)  -  9  .  '.31 

au  uv 


A  sufficient  condition  for  this  is,  of  course, 


J  *  0, 


in.  intr odlttion  of  a  sensitivity  function  and 
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which  is  also  necessary  if  the  second  term  in  5,3)  does 
not  vanish  (at  least  one  component  is  nonzero j. 


As  3  measure  of  the  sensitivity  of  J(u,v,a  )  with 
resrec:  :  •*  deviations  m  the  oercaocion  of  DM2  jf  . 
from  its  nominal  value  a°,  let  us  introduce  the  total 
cerivative  of  J(u,v,a°)  with  respect  to  a,  when  u  *  u*, 
v 3 v*f  satisfying  (2b),  and  at  the  point  a***0.  We 
call  this  function  the  fir37-order  je^sizioi tv  ~xr.ozicr. 
of  2(u,v,  i°)  with  respect  to  a,  at  a*  aJ,  in  view  of 
(l)  and  the  optimal  response  of  DM2  as  characterized 
(uniqueiv)  by  (2b): 

1 .  1  a3)  3  dJ(u  ,  v  ,  a°)  / da  +  0 

.  a  *"c 

*  (dJ/dv  )  (dv  /da+)  4.  -j 

t  *d 

*  [J  (u*,vC,  i°)P  +  J  (u.1*  ,vC  ,  -4°)  ]  VC  ,  (3) 


v  3  dv  (  a  )  /da 


and  is  determined  from  (2b) .  To  obtain  an  expression 
for  v?,  we  note  that  (2b)  is  in  fact  an  identity  for 
all  ^“"SA,  since  it  uniquely  determines  the  optimal 
response  of  DM2  to  the  announced  policy  (L)  of  DMl, 
with  his  perceived  value  for  a  beinj  a*.  Hence,  dif¬ 
ferentiating  (2b)  with  respect  to  1  ,  and  evaluating 
the  resulting  expression  at  a^-n0,  we  obtain 


When  vanishes,  not  only  the  second-order  sen¬ 
sitivity  function*  tut  : -_o  tr.e  thin—: -cer  sensicivif 


i3(j3)  -  d3j(UyAaVd.y  _  o  r.o) 

1  =a 

vanishes,  because  it  carries  (by  chain  rule  of  iir- 
f erentiat ion)  only  terms  that  involve  either  v*j  or 
[JUP  +  JV]  as  products.  Hence,  under  the  condition 
that  (8)  admits  at  least  one  solution,  and  when  DMl 
employs  the  corresponding  policy  (1),  if  DM2 ' s  percep¬ 
tion  a+  (of  a)  stays  within  an  .-neighborhood  of  its 
nominal  value  a°,  the  3rd  order  Taylor  approximation 
of  the  effect  of  this  discrepancy  is  zero.  ■.'»  now 
summarize  this  appealing  feacure  of  the  linear  policy 
(1)  in  the  following  proposition. 

Proposition  1:  Let  condition  (9)  be  satisfied,  and 
let  P*  denote  a  solution  to  (8).  Then,  if  DMl  employs 
the  policy 

.*(v)  3  uC+P*(v-vC),  (lla> 

and  the  unique  minimizing  decision  of  DM2  is 

v+(i  )  3  arg  min  J( . *(v) , v, a^) ,  (lib) 


{?':  ?*p’j  ■‘■j  ?-j  Iv+rj  p-j  ]  *  0  (4) 

uu  vu  uv  w  a  xU  tv 


v’  *  ?*?'J  -J  P  +  J  i  [J  ?^J  1  (5) 

I  uu  vu  uv  -/V  iU  iV 


J ('  f(v+) , v+, a°)  agrees  with  J(uc,vc,x0)  to  third  order 
in  when  it  lies  in  a  sufficiently  small  neighbor¬ 
hood  of  Equivalently,  the  discrepancy  in  costs  is 

of  fourth  order.  - 


Here  and  are  row  vectors  of  dimensions  L'n  and 
I'm,  respectively,  denoting  the  partial  derivatives 
with  respect  to  the  corresponding  decisio..  variables. 


When  the  objective  function  J(u,v,.t'j  is  affine  in 
4, we  can  obtain  more  explicit  results.  Specifically, 
let 

J(u,v,.)  *  g<u,v)  ■*“  «n(u,v»  (12a) 


where  g  a.u  h  are  continuously  dif ferenciuole  in  their 
arguments , 


h  iuC,v''  r  0,  (12b) 

ana  Z  is  strictiv  convex  in  (u,v)  for  all  .£  A.  Then, 

( S )  r eacs 

l»uiu',v")P  1-  hviuc.w')  -  0,  (13) 

a  solution  to  which  always  exists  because  (12b)  becomes 
equivalent  to  (9).  Hence  v£,  as  given  by  (5)  [evaluated 
at  i »  iP )  is  zero.  This,  in  turn,  implies  through  an 
iterative  verification  that  the  vector  dnv*‘(it‘)  /du+n, 
where  .2*)  is  given  by  (lib),  vanishes  at  i°,  for 
all  n*  1,2,...,  simply  because  the  second  term  in  (i) 

J  P  +  J  -  h  (u,v)P  ♦  h  (u.v) 

lU  IV  u  v 

is  not  explicitly  dependent  on  a.  Since  the  nth  order 
sensitivity  function 

I^Ci5)  -  dnJ (uL  a°)  /da"*"  ^  ^  (Id) 

i  *a 

carries  only  (dv^U*”) /da  )  ,  i*l,2,...,n  as  pro- 

duct  terms,  which  are  all  zero  whenever  ?  is  chosen  to 
satisfy  (13),  it  follows  that  sensitivity  functions  of 
ail  orders  vanish,  at  P .  Hence, 


team-optimal  solution  ror  minimizing  .  tv 

choosing  ?  such  that  ?’  transforms  the  vec t  r  V.  t 
t-h.1): 


?  ’h  *  < u ” ,  v")  *  -h  ’  (  j'"  , 
This  same  choice  of  ?*  transforms  i 
P’g^u'.v';  -  -g./uV. 


V.  EXTENSION  TO  THE  MULT I PARAMETER  CASE 


In  the  previous  sections,  we  have  restricted  cur 
discussion  to  the  case  i€ACR.  Vhen  ACRr,  the  first- 
order  sensitivity  function  l  ■  (  P)  becomes  a  ■ r 
vector  given  by 

I.(i°)  -  dJi-j*,v~,i3)/Aj~ 

X  m  X 

■  r  J  (u  ’ ,  v  ”  ,  x  ° )  P  ♦  Z  (a‘tv‘,/!,v‘  '  .  .-a 

'  u  v  *  i 

where 

v:  -  -rp’j  ?-*■?•  j  -j  ?+z  ' ~ :  ?-;  ■  .3b- 

t  uu  VU  IV  W  u  i  Vi' 

and  the  arguments  are  evaluated  at  u*u",  v*  v\  ■ 
Note  that  I,<aJ)  *3,  in  view  of  (6).  Furthermore, 
vc  -  0  'zero  matrix)  if 


Preposition  2:  ’Then  the  objective  function  is  given  by 
•12a>,  under  the  condition  (12b),  let  ?*  be  any  solu¬ 
tion  of  >13).  Then,  if  (lla)  is  employed  bv  0M1,  the 
response  of  DM2  (i.e.,  (lib))  is  independent  of  P , 
and  v~  »  .  Hence,  Jfu*“, v*, aJ>  *  J(us  .v*-,  a for  all 

•  *S  A,  tnat  is  the  jveraxl  performance  is  .rvcepencenc 
of  the  perception  of  DM2  regarding  the  value  of  x.  z 

In  the  next  section  we  provide  a  geometric  inter¬ 
pretation  of  this  appealing  feature  of  the  linear 

'  c  lit”  -men  the  test  function  .s  ir.  if  fine  tuner  .on  :f 
the  parameter  d . 

IV.  GEOMETRIC  INTERPRETATION  OF  TOTAL  INSENSITIVE 
VHEN  THE  OBJECTIVE  FUNCTIONAL  IS 
AFFINE  IN  A  PARAMETER 


Let  the  objective  function  J  be  as  given  by  (12a) 
with  h  satisfying  condition  (12b).  Since  J  is  strictly 
convex,  the  team  solution  (u=,vc)  when  t  -  is 
obtained  (uniquely)  from 

g  (u“,vC)  +■  i° h  (uC,vC)  *  0  (15a) 

u  u 

gy(uC , vc)  a°hv(uC,vC)  »  0.  f  1 5b ) 


Postmul tiplying  (15a)  by  ?,  adding  this 
taking  the  transpose,  we  have 


(PV  +g'>  +  a  (P’h*  +  h ' ) 


0. 


(15b) ,  and 


(16) 


20 


'.where  %.  oenotes  tne  range'  since  tnen  it  is  possible 
to  find  an  -r.-mi  matrix  ?  to  maice  the  second  product 

t ions  of  orders  ..  me  ;  vamsn  3t  tne  nominal  solu¬ 
tion  ooint;  hence,  affine  oolicies  have  very  appealing 
sensitivity  prooerties  also  in  tne  mul t iparameter  case. 
Vher.  condition  :2J)  is  not  satisfied,  however,  one  has 
-inimice  i  ^uitm'.e  •'-rr  : me  :  i  c.r.g  ;ens  i :  iv  it* 
function  « 1 1 n  resnect  : -  tne  rvmj  matrix  ?.  This  :a, 
m  general,  tne  second- or ter  sensitivity  functional 
1 2'  1 ' '  wni.cn  is  an  ir-r'  nennegative  definite  matrix. 

A  suitaole  n'tm  for  minimisation  is,  in  this  case, 

Tr  Ve  are  now  facet  with  an  jnconstrainea 

octimization  preolem  on  ?,  for  which  a  closed-form 
solution  does  not  in  general  exist;  however,  numeri¬ 
cally  it  is  a  feasible  proolem. 

Vhen  the  ojective  function  Z  is  affine  m  the 
parameter  vector  .tRr,  a  total  insensitivity  result 
could  be  established  under  certain  conditions,  bv  a 
direct  extension  of  the  discussion  of  Section  IV. 
Towards  this  end.  Let 

J(u,v,u)  *  g ( u , v )  ♦  |'h(U,7)  (21) 

where  g:!>V-R,  h:'JrV-Rr,  J  is  strictly  convex  and 
continuously  differentiable  in  (u,v).  Then,  the  opti¬ 
mality  conditions  for  ■«  *  P  are 


Pictorially,  the  vectors  (P'g^  +  g^)  and  (P'h^  +  h^)  are 
oppositely  oriented  when  xJ  is  a  positive  scalar. 
Clearly,  P  is  the  ratio  of  the  magnitude  of  the  vector 
(P'g^  +  g^)  to  the  magnitude  of  the  vector  (P'h^  +  h^). 

If  DML  chooses  P  such  that  (13)  is  satisfied,  then  the 
magnitudes  of  both  vectors  become  zero.  In  this  case, 
if  x°  is  replaced  by  x  #  o°  in  (16),  the  equation  would 
still  hold,  and  (u c,vc)  satisfies 

(PV  *■  g')  P?'h’  * h*)  -  0.  (17) 

U  ’v  U  V 

Since  (17)  is  the  condition  used  by  DM2  to  optimize  v 
(see  also  (2b)),  he  will  choose  v«vc,  n<  matter  what 
hi3  perceived  value  of  x  is.  Thus,  DM1  achieves  the 


gtj(u  ,v“) 
t 


h  (u  ,v")  *  0 
u 

,  i  t .  a ' ,  ,  t  t .  . 

g^(u  ,v  j  x  h^(u  ,v  )  *  0 


i  22b) 


where  hu  (respectively,  hv)  is  an  rvn  (respectively, 
r^m)  matrix.  The  optimal  response  of  DM2,  under  the 
policy  (l)  for  DM1,  is  determined  uniquely  from  (for  \ 
general  »> 


( P  1  h  ! 


i-L 


h'  ) 

iv 


0 


(P  ’  g  ’  g ') 


v  23 ) 


■-neT<£  subscript,  i  cenotes  the  i'ch  component  of  che 
rres P -n i  vector.  Now,  let  us  assume  that  there 
exists  an  m»n  matrix  ?  sat i s tv  mg  simultaneously 

n . . .  u  ,  '  ?  •*•  h  .  %  m  (  u  “ ,  v  * )  »  ) ,  i  ■  r.  1 1  i  > 

Truer  this  condition,  the  second  term  in  (23)  vanishes 
at  v  •  v*t  tor  all  iC  ACRr,  and  furthermore  the  first 
term  also  vanishes  in  view  of  (22a)-(22b),  by  basically 
following  the  argument  of  Section  IV.  Hence,  under 
this  particular  choice  of  ?,  v * vC  is  the  unique  solu¬ 
tion  to  (23)  for  all  values  of  a;  that  is,  the  optimal 
response  or  DM2  is  independent  of  his  perception  of 
the  value  of  a,  provided  that  strict  convexity  of  J  is 
preserved.  To  summarize. 


where  f  •  :  R^.  ■  R^  -  R*.  is  i  continuous  function  of  x,- ... 
and  u-,*ind  is  strictly  increasing  in  its  second  argu¬ 
ment.  Here,  Xj_a  denotes  the  initial  armament  level  or 
DMi. 

In  order  to  obtain  some  explicit  results,  let  us 
adopt  the  quadratic  objective  functional  model  proposed 
by  Simaan  and  Cruz  [Simaan  and  Cruz  (1973a)  and  Simaan 
and  Cruz  (1975b)],  because  of  its  analytical  tact- 
ability  and  other  appealing  features  in  relation  with 
other  existing  models;  namely,  let 

J.  (x,  ,x,,u,  ,u„)  -  T  ■a.Cu.-'-'  )2  +  0°(x.-S  x.-v  r  (27) 

i  1  2  1  -  2  ill  iii;i 

and 


Proposition  3:  When  che  objective  function  is  given 
oy  (.21),  let  there  exist  a  solution  to  (24),  to  be 
denoted  ?*.  Then,  if  che  policy 

*(v)  -  uC  *  ?*(v-vC) 


X  .  -  f  .  (x  .  ,uj  ■  2  .  X  .  +M  .  , 
I  1  LO  i  i  10  I* 


i— 1,2;  j* 1 , 2 ;  ipj 


(2S) 


where 


R  '  0,  or  ■  0,  S.  j  0,  0 

i  i  -  I 


i-1,2. 


is  employed  by  DMI ,  the  response  of  DM2  (which  is  (lib) 
with  Rr‘>  is  independent  of  /**,  and  v+»vc. 

I  consequently ,  J(u-#V\  ■  J(uc,vP,a-3)  for  all 

AC  Rr.  - 

In  the  next  section  we  consider  an  example  chat 
involves  arms  race  oecween  two  nations,  which  serves 
to  illustrate  some  of  the  ideas  generated  in  this  and 
the  previous  sections.  Another  example  from  micro¬ 
economics  can  be  found  m  [Cansever,  3a$ar,  and  Cruz 
19331  .  . 


Here,  x.  denotes  the  given  initial  armament  level  of 
DMi  and  expression  (27)  reveals  the  fact  that  each  DM 
wants  to  reduce  the  gap  that  exists  between  his  arma¬ 
ment  level  and  a  linear  function  of  the  other  DM’s 
armament  level,  and  at  the  same  tine  wishes  to  minimize 
his  expenditure.  We  refer  to  [Simaan  ana  Cruz  (1975b)] 
for  an  elaborated  interpretation  of  (27).  Under  this 
set  up,  there  exists  a  unique  patir  (u?,u5),  minimizing 

_’ ( x ,  ,  x ., ,  u  ^ ,  u ., )  as  a  function  of  *  i~,  which  corre¬ 
sponds  to  the  pair  <ur,v^  in  the  general  discussion  of 
Sections  II  and  V. 


v:.  AS  ZMAlgLZ  PROM  CHE  PRO  SUM  -T  ARMAMEirT 
I. IMITATION  AND  CONTROL 

Ir.  their  papers  on  armament  race  and  control 
Simaan  m  i  Cruz  19750  and  Simaan  ind  Cruz  '1975b)], 
j-.-.asr.  ana  iruz  cave  moceleu  me  arms  race  prcolem  as 
a  r.or.coooerat ive  differencial  game  between  two  nations. 

A  salient  feature  of  this  model  is  that,  when  the 
respective  cost  functionals  are  taken  to  be  quadratic 
in  the  decision  variables,  the  resulting  optimal  state 
trajectory  yields  a  discretized  version  of  the  armament 
model  proposed  earlier  by  Richardson  [Richardson  (i960)]. 
We  will  consider  here  the  case  when  the  two  nations,  DMI 
ana  DM2,  have  agreed  to  reduce  their  respective  armament 
■expenditures .  Such  a  situation  inevitably  requires  the 
presence  of  an  element  of  cooperation  between  DMI  and 
DM2,  since  any  significant  departure  from  the  armament 
level  jointly  agreed  upon  may  eventually  lead  to  the 
original  high  armament  expenditure.  Towards  t.ne  formu¬ 
lation  of  this  problem,  let  us  assume  chat  the  goals 
o:  the  DM's  can  be  represented  by  two  objective  func¬ 
tionals  -j \  ( x  ,  x  t  ,  u  i  ,  u  t  )  »  i»i,2,  wherein  DMi  aims  to 
optimize  J^.  In  order  to  incorporate  the  cooperation 
element  discussed  above,  we  will  adopt  che  Pareto 
optimal  equilibrium  concept,  which  will  be  realized 
[Schmitendorf  and  I.eitraann  '1974)]  if  the  DM's  jointly 
cpc imize 

Jfx.. a,)  -k.Jl.*l.*;tUl.u,)-it,J,(xl.x;.ul.u,) 

05) 

1  ) 

where  <c.€  R+;  u€  R^,  ana  u^R^  denote  DMI  and  DM2  ’  s 
armament  investments,  respectively,  and  x.-  reptesents 
the  armament  level  of  DMi,  i* 1,2,  which  further 
sat  is f ies 

x  -  :  .  ’  x  ;  ,  u  .  i  .  i-1,2.  ■  26) 


As  it  may  oe  the  case,  one  cc  the  DM's,  say  DM2, 
may  deviate  from  The  reason  behind  such  a  move 

may  be  chat  DM2  totally  ignores  the  cooperation,  ana 
minimizes  his  own  objective  functional.  Assuming  that 
each  DM  :an  monitor  the  decisions  c:  his  icversar**, 
tms  situation  vou.c  -mmeu watery  give  rise  to  a  asr, 
equilibrium  with  hizn  armament  expenditures.  Since  we 
have  assumed  that  each  DM  desires  to  reduce  his  expen¬ 
ditures  while  maintaining  i  certain  balance  of  powers, 
such  a  unilateral  and  large  deviation  will  be  unlikely. 
In  its  stead,  we  will  assume  that  DM2  may  have  an 
inn  tive  to  perform  a  relatively  small  deviation  from 
th.  'ireco  eauilibrium  point,  being  motivated  by  :ne  cf 
the  following  three  considerations: 

i)  DM2  nav  ieciae  to  promote  nis  relative  impor¬ 
tance  in  the  lgreemenc,  w  icn  is  reflected  bv  an 
increase  in  the  value  ?f  .  from  to  i~,  without 
informing  DMi,  while  DMI  still  uses  the  value  \°  in 
his  objective  functional; 

ii)  DM2  mav  ievelop  a  different  perception  of  the 
values  of  one  or  more  coefficients  in  the  team  objec¬ 
tive  functional  without  informing  DMI.  let  us  assume, 
for  instance,  that  DM2  has  decided  to  place  higher 
priority  and  emphasis  on  reducing  the  gap  between  his 
armament  level  and  the  linear  functional  of  DMI * s  arma¬ 
ment  level  than  on  minimizing  his  expenditure;  more 
precisely,  chat  he  has  decided  to  increase  the  value 

of  n,  from  to  Ot» 

iii)  3och  i)  and  ii)  mav  be  present. 

We  now  analyze  these  three  cases  separately. 

\:.?<r  ;  .  This  is  similar  to  the  analysis  of  Section 

IV.  The  optimal  strategy  for  DMI,  which  leads  to 
(uf,uS)  as  final  outcome,  independent  of  possible 
deviations  in  DM2' a  perception  n  tw\  is  given  by 
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\ise  ii  .  The  solution  here  again  follows  from  the 
analysis  of  Section  XV.  Hence,  there  exists  an  optimum 
insensitivity  strategy  realizing  the  team  solution 
independent  of  DM2 ’ s  different  perceptions  of  0-,,  and 
such  a  strategy  is  given  by 

•,Ui)(u,)  -  i/'f— 4-  (u,-uh-  (31) 

i  -  1  sn  -  - 


Hi  .  This  case  involves  multiparameters  where 
condition  (20)  is  not  satisfied.  Hence,  within  the 
class  of  affine  policies,  there  does  not  exist  any 
element  which  makes  the  cost  of  DM1  completely  insen¬ 
sitive  to  discrepancies  in  DMZ's  perceptions  in  more 
chan  one  parameter.  In  order  to  overcome  this  dif¬ 
ficulty,  we  adopt,  as  discussed  in  Section  V,  the 
scalari2ed  sensitivity  function  Tr- l9( t°tQo)  ' .  and 
minimize  it  subject  to  the  constraint  thac~the  stra¬ 
tegy  of  DM l  is  given  by 


(lii) 


-p(iii)(u,-ub. 


(32) 


of  his  subordinates.  There  is  an  underlying  goal,  or 
objective,  which  involves  a  successful  completion  of 
a  mission  or  task  (such  as  multi-object  tracking  and 
fire  control),  and  this  goal  is  determiend  by  the  DM's 
at  the  top  of  the  hierarchy  in  rather  general  terms 
(i.e.,  not  in  fine  detail),  which  is  then  transmitted 
to  the  relevant  DM's  at  the  lower  Levels. 

Hence,  in  a  general  framework,  a  system 
involves  a  team  of  DM's  who  act  in  an  uncertain 
environment,  and  who  have  limitations  on  control  and 
communication  capabilities.  However,  realistically, 
this  is  noc  strictly  a  team  problem,  because,  in  an 
uncertain  environment,  it  is  unlikely  that  every  DM 
will  develop  precisely  the  same  perception  of  the 
ultimate  goal  in  every  fine  detail.  In  fact,  in  order 
to  model  systems  as  team  problems,  it  is  absolutely 
necessary  Chat  all  DM's  have  exactly  the  same  percep¬ 
tion  of  an  existing  common  goal  and  quantify  this 
perception  in  exactly  the  same  way.  Any  discrepancy 
chat  exists  between  the  perceptions  of  the  DM's  on  the 
underlying  common  goal  will  lead  to  a  decision  problem 
which  cannot  be  treated  as  a  team  problem,  and  optimal 
decision  rules  derived  by  totally  ignoring  (or  over¬ 
looking)  this  aspect  of  the  problem  are  apt  to  lead 
to  outcomes  which  are  extremely  sensitive  even  to  small 
variations  in  the  perceptions  with  regard  to  real 
underlying  goals  of  the  mission.  The  approach  devel¬ 
oped  in  this  paper  remedies  this  deficiency  because  it 
takes  into  account  the  possibility  that  the  DMs '  per¬ 
ceptions  of  the  "team  goal"  may  deviate  from  the 
nominal  set  by  the  highest  level  decision  making  unit. 


This  problem  can  be  shown  to  admit  a  unique  solution 
which  can  be  obtained  explicitly.  Hence,  when  DMi  is 
.nee  r  to  in  about  2'Al’ i  perception  of  both  -  ana 
there  still  exists  an  affine  scracegy  which  minimizes 
an  appropriate  scalar  function  representing  the  sensi¬ 
tivity  of  DMi's  incurred  cost  with  respect  to  devia¬ 
tions  in  these  coefficients  from  their  nominal  values, 
me  such  a  scracegy  is  given  hv  [21} . 

In  the  preceding  analysis,  ?  is  the  same  coef¬ 
ficient  as  DMI  woula  have  used  in  his  strategy  in  a 
Scackelberg  game  with  DM2  being  the  follower  and  DMI 
enforcing  the  point  (ujy.12)  .  On  the  ocher  hand,  in 
case  ii  ,  by  announcing  a  strategy  of  the  form  (31), 
DMI  makes  DM2's  objective  functional  independent  of 
the  uncertain  coefficient  Q2.  Therefore,  DM2's  dis¬ 
crepancies  do  not  affect  the  team  solution  anymore. 
However,  when  the  number  of  uncertain  coefficients  is 
large  as  compared  with  the  dimension  of  DMs'  decision 
vectors,  there  3till  exists  a  compromise,  which  is  to 
minimize  the  cumulative  effect  of  variations  of  uncer¬ 
tain  parameters  around  their  nominal  values: 

(iii)^^  designed  to  perform  3uch  a  compromise. 

VII.  CONCLUDING  REMARKS 

In  this  paper  we  have  introduced  the  notion  of 
optimum  minimum  sensitivity  incentive  policies  in  team 
ieciaion  problems  wherein  one  member  of  the  team  has  a 
somewhat  different  perception  of  the  common  goal  than 
the  ocher  one,  and  we  have  derived  explicit  incentive 
policies  which  render  the  incurred  value  of  the  team 
objective  functional  Least  sensitive  to,  and  in  some 
cases  even  independent  of,  the  discrepancies  described 
above. 


Two  possible  extensions  of  the  general  approach 
of  this  oaoer  are  to  dynamic  multi-stage  decision 
problems  and  to  stochastic  team  problems.  In  the 
latter  case  a  natural  source  of  discrepancy  is  the 
a  priori  statistical  information  which  is  normally 
assumed  to  be  shared  by  the  DM's.  A  recent  reference 
r3a$ar  ^1983)]  addresses  Che  -question  of  existence  of 
suitaoie  equilibrium  solutions  :or  such  proolems  *nen 
there  is  discrepancy  in  the  subjective  probability 
measures  characterizing  the  probability  space. 
Derivation  of  minimum  sensitivity  incentive  policies 
in  this  concext  is  currently  under  study. 
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Abstract.  In  this  paper  ve  consider  a  general  class  of  stochastic  Incentive  decision 
problems  in  which  che  leader  has  access  co  the  concrol  value  of  the  follower  and  to 
private  as  well  as  common  information  on  the  unknown  state  of  nature.  The  follower's 
cost  function  depends  on  a  finite  number  of  parameters  whose  values  are  noc  known 
accurately  by  the  leader,  and  in  spite  of  this  parametric  uncertainty  the  leader  seeks 
a  policy  which  would  Induce  the  desired  behavior  on  the  follower.  Ue  obtain  such  robust 
policies  for  che  leader,  which  are  smooth,  induce  the  desired  behavior  at  the  nominal 
values  of  these  parameters,  and  furthermore  make  che  follower's  optimal  reaction  either 
minimally  sensitive  or  totally  insensitive  to  variations  in  the  values  of  these  param¬ 
eters  from  che  aominals.  The  general  solution  is  determined  by  some  orthogonality 
relations  in  some  appropriately  constructed  (probability)  measure  spaces,  and  leads  to 
particularly  simple  incentive  policies.  The  features  presented  here  are  intrinsic  to 
stochastic  decision  problems  and  have  no  councerparcs  in  deterministic  incentive  problems. 

Keywords.  Stochastic  systems;  economic  systems;  team  theory;  decision  theory;  game 
theory;  optimization. 


I .  INTRODUCTION 

In  this  paper  we  consider  a  general  class  of 
stochastic  Scackelberg  game  problems  (equivalently, 
incentive  decision  problems,  in  our  context)  in 
which  the  leader  has  access  to  the  control  value  of 
che  follower  and  co  private  as  well  as  conanon 
information  on  the  unknown  state  of  nature,  whereas 
the  follower  has  access  co  only  common  information 
which  is  shared  by  the  leader.  It  is  further 
assumed  chat  che  follower's  cost  function  (which  is 
strictly  convex,  but  not  necessarily  quadratic) 
depends  on  a  number  of  parameters  whose  values  are 
not  known  accurately  by  che  leader.  The  objective 
is  to  obtain  robust  incentive  policies  (decision 
rules)  for  the  leader  which  would  induce  che 
desired  (by  the  leader)  behavior  on  che  follower  at 
the  nominal  values  of  these  parameters,  and  be 
minimally  sensitive  to  deviations  in  che  values  of 
these  parameters. 

For  a  rough  mathematical  (symbolic)  description  of 
che  problem,  let  U  and  V  be  the  decision  spaces  of 
the  leader  and  che  follower,  respectively,  xSX 
denote  the  random  3tace  of  nacure,  and  z£Z  and 
y€Y  denote  che  common  (co  both  DMs)  and  privace 
(co  che  leader)  Information  related  to  x.  Ue  are 
assuming  at  this  point  chat  X,  Z  and  Y  are  endowed 
with  sufficiently  rich  topology  so  chat  probability 
measures  can  be  defined  on  their  subsets.  Let  7f 
denote  che  class  of  all  mappings  from  Z  into  V 
(satisfying  3ome  smoothness  conditions  to  be 
delineated  later) ,  and  T  be  the  class  of  all 
mappings  from  V*Z*Y  into  U.  Furthermore  let  T 
be  the  class  of  mappings  from  Z  * Y  into  U.  Ue  will 
denote  generic  elements  of  7s,  7j  and  7  by  u,  v 
and  v,  respectively,  so  that  u-u(z,7),  v«v(z). 
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Y»Y(u,Z,y).  Let  us  assume  that  there  is  a  point 
(ue,ve)  in  7s  «  7f  which  is  most  desirable  by  the 
leader  (such  as  a  system  trajectory  or  control 
trajectory)  and  he  seeks  co  determine  a  policy  <67 
which  would  force  che  follower  co  such  an  action 
Chat  che  resultant  ouccome  (u,v)  in  che  product 
space  7s*  7f  is  sufficiently  close  to  (u:,v:),  by 
also  caking  into  consideration  the  fact  that  the 
cost  function  of  the  follower  is  noc  known 
accurately. 

Let  L(u,v,x,a)  denote  the  loss  function  of  the 
follower,  depending  on  a  set  of  parameter  values 
o£A.  Let 

J(u,v,a)  -  E(L(u(z,y) ,v(z) ,x,a) i 

where  E  is  the  expectation  over  che  scaciscics  of 
che  random  variables  x,  z  and  y.  (Noce  that  we 
have  abused  che  notation  here  and  have  used  the 
same  notation  for  both  random  variables  (or  vectors) 
and  their  realizations.)  Likewise  we  introduce 

J(y , v,a)  -  E(L(y(v(z) ,z,y) ,v(z) ,x,o) ; 

which  we  call  che  cost  function  of  che  follower. 

Ue  assume  chac,  for  each  1  ■=  A  and  x<=  X,  L(u,v,x,a) 
is  strictly  convex  in  the  pair  (u,v,)€73»  7j,  and 
furthermore,  7  is  structured  so  that  for  each  v €  7 
and  a£/A,  J(y,v,j)  is  strictly  convex  in  v. 

Under  these  assumptions,  to  each  v € ' ,  and  for 
fixed  a€/A,  there  corresponds  a  unique  element  of 
7«,  called  vy,  which  will  minimize  J(v,v,i)  aver 
■;■=  7.-.  [We  are  assuming  here  that,  as  a  rational 
decision  maker,  the  follower  chooses  his  optimal 
policy  (or  rational  reaction)  bv  minimizing  Cne 
function  I  ( v ,  v ,  a )  where  i  is  che  true  value 
characterizing  his  cost  functional.  1  Hence.  -<? 
have  che  unique  correspondence  (for  each  ::xea 
i€/A)  ,• 


where  che  mapping  i\  depends  exo  lie  it  ?n  .  ma 
involves  a  minimization  ooeration. 


Lac  us  further  noce  chac  this  unique  relationship 
necessarily  yields  a  unique  element  In  rs,  given  by 

u(z,y)  •  i (v(z) ,z,y)  , 

and  hence  we  can  associate,  with  each  y€r,  a  unique 
pair  Co.y) :  3 

'■  -  .  Y  - i— -  (u,v) 


this  point  (uc,vc)  is  induced  by  the  leader  by  a 
linear  (In  v)  policy 

Y(v,z,y)  *  uc(z,y)  -  Q(z,v)  [v  -  vc(z)J  (2.2) 
whenever 

E  •Lu(uC(z,y),vt(z),x,u*)i  *  0  (2.3) 

x/y.r  u 


where  the  mapping  is  denoted,  in  this  case,  by  3^ . 

This  is  che  familiar  Input-output  relationship  of 
system  theory  (with  che  mapping  3,  being  much  more 
complicated  than  che  one  normally  encountered  In 
system  theory)  and  hence  we  can  call  F  che  inpuc 
space  and  UxV  che  output  space.  In  terms  of  che 
familiar  Jargon  of  system  theory  we  can  rephase  che 
problem  posed  in  che  earlier-pan  of  this  section 
as  follows: 

Problem  A 

For  a  given  nominal  value  a* S/a,  find  an  Inpilc-.. 
(control)  y  which  would  drive  the  system  output  to 
a  desired  value  (uc,vc).  [Noce  thac  che  desired 
outpuc  Is  in  fact  a  stochascic  variable  (or  process); 
hence  what  we  pose  here  Is  akin  co  stochastic 
controllability. ] 

The  solution  to  Problem  A  has  in  fact  been  obtained 
in  [1J;  an  important  feature  is  thac  it  is  generally 
nonunlque,  even  in  che  class  of  policies  (for  che 
leader)  which  are  linear  in  v.  This  then  prompts  a 
second  question  (additional  design  criterion) 
relaced  to  the  robustness  of  che  "optimal"  y: 

Problem  3 

Among  the  control  inpucs  which  solve  Problem  A, 
which  one  (ones)  leads  (lead)  co  outputs  chac  are 
least  sensitive  co  variations  in  che  value  of  a  from 
the  nominal  value  a? 

In  this  paper  we  obtain  a  complete  solution  to  this 
problem  by  using  some  ideas  originally  introduced 
in  [2]  for  che  deterministic  version  of  the  problem. 
The  feacures  of  che  solution  for  the  stochastic 
problem  are,  however,  inherently  different  from 
those  of  che  deterministic  problem,  which  will  be 
poinced  out  as  we  go  along. 

In  Section  II  we  will  presenc  a  complete  solution 
co  che  scalar  problem,  while  extension  to  che  vector 
case  has  been  included  in  che  fuller  version  [3]. 

The  theory  developed  in  Seccion  II  is  illustrated 
in  Section  III  via  a  numerical  example  motivated  by 
a  problem  chat  arises  in  the  control  of  large  organi¬ 
zations  [3].  Section  IV  discusses  possible  exten¬ 
sions  to  ocher  classes  of  related  problems. 

II.  MAIN  RESULTS 

In  this  section  we  assume  that  che  spaces  U,  7,  X, 

'{ ,  Z  are  1-dimensional  Euclidean,  and  a  is  a  single 
parameter^wlth  nominal  value  (as  perceived  by  the 
leader)  a  .  The  actual  value  of  a  may  be  in  a 
small  neighborhood  of  a*,  and  we  assume  chac  L(u,v, 
x,i>  is  strictly  convex  in  (u,v),  and  twice 
continuously  differentiable  in  u,  v  and  a,  when  a  is 
restricted  co  lie  in  this  neighborhood.  Furthermore, 
we  assume  that  every  y€T  is  continuously  differen¬ 
tiable  in  v.  The  random  variables  x,y,z  are  joint¬ 
ly  3econd-order  random  variables,  and  v,  in  addition 
to  being  continuously  differentiable  in  v,  is 
measurable  in  z  and  y,  and  u,v  are  also  measurable 
in  t.neir  arguments.  Furthermore,  if  L  is  measurable 
.n  x.  tie  expectation  of  L.  vriccen  as 

E  -  Li  'V'tl.t.y  >,  Vizi.x.a):  ,  (2.1) 

.  s  well -derined  for  ail  V  ^  F  ' 3 ,  vS  and  for 
ever-  i  .n  a  neignbornood  it  i  .  We  finally  assume 
mat  <  icsiraole  ootnc  m  the  croduct  soace  ~s  X 
is  nosen  t he  leader.  is  uE.vs>. 

.nceed  'axes  the  /a.ue  i  .  the  results  c:  ij, 

. . zee  c  t  n  is  scalar  ase  .  ir.oi  race  that 


with  positive  probability  in  (y,z).  Since  we  have 
a  simpler  (scalar)  problem  here,  the  equation  for 
Q  can  be  obcained  directly.  Substituting  (2.2) 
into  (2.1)  we  have 

E  (  E  (L(uC  -  Q[v  -  vC],v,x,a  );;  (2.4) 

z  x,y/z 


which  is  strictly  convex  in  v  since  L  was  strictly 
convex  in  (u,v)  and  y  Is  linear  in  v.  Hence  (2.4) 
admits  a  unique  minimum  la  fj,  which  is  obtained  by 
differentiating  the  inner  expression  with  respect 
to  v,  for  fixed  z,  and  setting  equal  to  zero: 


E  iLuQ  *  V  ’  0 

x.y/z 

E  'iQ(z.y)  E  (Lu(uC  -  Q(v  -  vc].v,x,i*)!}  (2.5) 

y/z  x/y,z 

•  E  fL(uC-Q[v-vC],v,x,a);. 
x,y/z 


This  is  the  equation  that  the  minimizing  v  would 
satisfy  (this  is  also  sufficient  because  of  strict 
convexity),  and  it  is  in  general  nonlinear  in  v. 
However,  we  do  not  need  to  solve  this  for  v,  but 
simply  find  a  q  such  that  its  unique  solution  is 
vc— which  would  turn  yield  Y(ve,z,y)  •  ur,  in 

view  of  (2.2),  and  thereby  the  desired  solution 
would  be  induced.  Now,  subscicuting  v  *  vc  in  (2.5) 
we  obtain 


•»  E  (q(z.y)F(z.y)  ;  *  G(z)  (Z.n) 

y/ z 

where 

F(z,y)  -  E  iLu(uC(z,y) ,vc(z) ,x, j  ): 
x/v,z 

.  .  „  i 

C(z)  -  E  *. L  ( u  ( z , y ) , v  ( z ) , x , o  )!  . 

x,v/z 

If  this  had  been  a  deterministic  problem,  che 
solution  co  (2.5)  would  be  unique,  thus  oucruling 
any  possibility  of  obtaining  optimal  policies  chac 
satisfy  ocher  design  criteria.  For  che  stochascic 
problem,  however,  (2.6)  admics  infinitely  many 
solutions,  with  a  family  of  such  solutions  being 
(which  turns  out  to  be  sufficiently  rich)  : 

Q(z.y)  -  3(z ,y)G(z) / [  E  (g( z ,y) F(z ,y) r 1  (2.3) 

y/z 

where  g  is  any  function  measurable  In  (z,y)  satisfy¬ 
ing  the  condition 

E  (g(z,y)F(z,v)  i  ^  0  ,  z€  ]R  .  (2.9) 

y/z 

Verification  of  this  result  Is  by  direct  substitu¬ 
tion  of  (2.3)  lnco  (2.6).  We  now  summarize  this 
result  below: 


Theorem  1.  Problem  A  formulated  in  Section  I  admits, 
for  the  scalar  version,  infinicely  many-'  linear 
(in  v)  solutions,  with  one  such  family  given  bv 
(2. 2), (2. 8),  under  condition  (2.9).  - 


^Here  Ly  is  the  partial  derivative  if  L  with 
respect  to  u.  and  £  ^  denotes  the  conditional 

expectation  over  the  statistics  of  x  given  iy,z). 

3) 

Note  that  the  existence  of  infinitely  manv 
solutions  is  mainlv  due  to  the  fact  thac  the  leader 
was  also  allowed  to  acquire  private  information  y 
i be  it  correlated  or  uncorrelatec  with  the  common 
information  z).  If  this  had  not  been  the  case, 
then,  a,  being  only  a  function  of  z.  would  cancel 
’ut  in  (2.3)  thus  leading  to  a  unique  solute  -n 


-nen  ,oe: la 


In  view  of  chls  nonuniquentss  feature  of  che  solu¬ 
tion  to  Problem  A,  Problem  B  becomes  relevant  which 
we  address  to  In  che  sequel.  Towards  this  end,  lec 
us  assume  thac  the  leader  adopts  the  policy  (2.2) 
with  Q  chosen  as  in  (2.3)  and  g  being  arbitrary 
(but  satisfying  (2.9)).  For  any  such  g,  chis  is  an 
optimal  policy  inducing  (ut,vt),  provided  chat  x«a*. 
If  i#a*,  however,  the  follower's  reaction  to  y  will 
no  longer  be  v-vc.  In  fact,  substituting  (2.2)  inco 
(2.1)  with  a*  replaced  by  a  general  a,  and  differen¬ 
tiating  the  resulting  expression  with  respecc  co  v, 
we  obcaln  (co  replace  (2,3)) 

E  (Q(z,y)  E  Uu(uC  -  Qtv^  -  v£]  .v^.x.a)  >) 

y/z  x/y.x  (2.10) 

•  E  (Lv(u  -  QIv^  -  v  J.v^.x.a)) 

x.y/a 

which  admics  a  unique  solution  v^(z)  when  a  lies  In 
a  neighborhood  of  a*  (because  of  strict  convexity 
of  L) .  This  solution  is  not  obtainable  explicitly, 
unless  we  specify  a  structure  for  L  (such  as  quad¬ 
ratic);  however,  we  in  fact  do  not  need  an  explicit 
expression  for  v^(z),  as  che  following  discussion 
reveals . 

The  soluclon  v1(z)  co  (2.10)  will,  in  general, 
depend  on  different  choices  of  Q  out  of  che  family 
(2.8)-(2.9).  What  Problem  3  alludes  to.  Is  a  choice 
which  will  render  che  difference  |vi(z)  -  vc(z) I 
sufficiently  small  (in  norm)  whenever  a  is  close  co 
a*.  Note  Chat,  if  v3(z)  is  close  co  vc(z)  -  vllt(z), 
Chen  u,(z,y)  «  y(v;i,z,y)  will  be  close  co  uc(z,y)  » 
Oj,(z,y) ,  because  of  continuity  properties  of  ■< . 
Hence,  as  a  measure  of  che  closeness  of  v^fz)  to 
vc(z),  ve  now  cake  che  firsc  order  term  dv^/dc  and 
evaluace  it  at  a»a*. 

Since  (2.10)  is  an  Identity  for  all  l  of  interest, 
we  could  lit" ferenciate  it  with  respect  co  a  (for 
each  fixed  z) ,  co  obtain  che  equality  (at  »«a*)  : 

E  ■  Q(z,y)(dve/da)  E  i  Luv<ut,vc,x,s*) 
w 2  xj V  ,  z 

-  Q*(z,y) (dv£/da)  E  -.L  (uv,ve,x, ■>*).- 

x/y,z 

-  Q(z.y)  Z  U  (u C,vs,x,a*);} 

x/y  ,z 

»  E  idv£/da)L  (u£,v£,x,x*) 
w 

x,y/  z 

-  Q(z,y) (dvC/da)Lvu(uE,vc,x,a*) 

+  L  (uk,ve ,x,a*) ; 

/"X 


f,(z,y)  *  F(z.y)  E  {Lva(u£,v£,x,a«); 

J  x,y/z 

-  G(z)  E  (L  (u£,v£,x,a*) ;  . 


r*  •  ;ge-s  :  g  satisfies  (2.9)}  .  (2. IS) 

Then  it  follows  from  che  above  discussion  that  if 
there  exists  a  g€  !"s_,  satisfying  (2.13),  the 
corresponding  policy  for  the  leader  renders  the 
firsc-order  sensitivity  function  zero,  i.e.,  co 
first-order  v3  (and  consequently  ua)  becomes  insen¬ 
sitive  to  variations  in  che  value  of  a,  from  che 
nominal  value  a*.  But,  such  a  gS  rs  exists 
generically— choose  any  random  variable  chat  is 
orthogonal  to  f^€rs  under  che  conditional  probabil¬ 
ity  measure  P(y/z),  but  not  orthogonal  co  F(z,v), 
which  will  be  possible  as  long  as  F  is  not  linearly 
dependent  on  fj_  under  P(y/z).  A  sufficient  condition 
for  chis  to  be  true  is,  for  every  kSCf, 

E  (L  (u£ ,v£,x,a») i  +  k(z)  E  (l  (u£,v£,x,a*) : 
x/y,z  U°  x/y,z  u  (2.16) 

A  candidate  solution  to  (2.13)  is  Chen, 


g(z.y>  •  gQ(z)  -  gj(y) 


g0(z)  * 


E  fg  (V)f  (z,v)j 

v/  z _ 

E  ( f .  (  z ,  y )  r 
y/z 


which  can  easily  be  shown  co  satisfy  (2-13)  for  any 
y-measurable  g^(y).  Hence,  we  In  fact  have  a  family 
of  solutions,  parameterized  by  g^.  Note  chat  to 
assume  f ^(z ,y ) ;  r  0  here  is  not  restrictive 

because  if  that  is  not  Che  case  (2.13)  is  trivially 
satisfied  (by,  for  example,  choosing  g  to  be  a 
constant) . 

Theorem  2 .  Let  (2.3)  and  (2.16)  be  satisfied. 

Then,  the  first  order  sensitivity  function  dvc/da 
can  be  made  identically  zero  by  an  incentive  policy 
of  the  form  (2.2)  where  Q  is  given  by  .2.3)  and 
gS  r3_  satisfies  (2.13),  with  one  family  of  candi¬ 
dates  being  (2.17). 

Proof ■  The  proof  follows  by  construction,  from  the 
discussion  preceding  the  theorem.  - 

An  expression  for  che  second-order  sencicivity 
function 


d~v£ '  z) 


d'v  (z) - 


dv  (z)/da  -  dv  (z)/dai  (2.11 

a  t  _  ^ 

Since  dv£/da  i3  z-measurable,  we  can  easily  solve 
for  it  to  obcain 

E  U.  q  -  L 

,  c  ,  ua^  va 

dv _ _ .i.y/z _ ,,  , , 


da  it  ^-2L  OH  ‘  ' 

,  uu^  uv^  w 

x.y/ z 

where  L.J;i , L.,a , Luu , Luv  all  have  (u£,v£,a*)  as  their 
arguments,  h'ote  chat  since  L  was  strictly  convex 
in  (u,v),  che  denominator  of  (2.12)  is  always  posi¬ 
tive  and  hence  dv‘/da,  which  we  will  henceforth 
call  che  first  order  sensitivity  function,  is  ven¬ 
der  ined  . 


can  be  obtained  by  following  the  lines  chat  led  to 
(2.12).  In  chis  case  (2.13)  is  replaced  by  (using 
the  fact  that  dvc/da  can  be  made  equal  to  zero) 


E  it  Q  -  L 
,  uaa  voo 

,y/z 

O  -  2L  Q  +  L  '■ 
uu  uv  w 


which  can  be  made  zero  if  and  only  if  g  -  “ 
(2.3)  satisfies  (as  a  counterpart  of  i.Z.13): 

E  •  g(z,y)f„u,y)  -  0 


*,(z.y>  - 


*  ,  t  t 

L  u  ♦  v  ,  x .  w 


Mow.  the  first  objective  is  to  make  this  expression 
identically  zero,  by  an  appropriate  choice  of  g(z,y) 
in  f'Z.i).  Substituting  (2.3)  into  the  numerator  of 
(2.12)  and  reshuffling  some  terms,  we  obtain  the 
condition 

E  \gU,y)f ;  -  0  (2.13) 

v/z 


-  (  2  E  ■  L  1  U  ,  V  ,  X ,  •*  *  )  • 

.  Uia 

x/y  ,z 

Hence,  for  both  firsc  and  second-order  ser.sttivitv 
functions  co  be  identically  zero,  ic  is  sufficient 
co  find  a  which  is  orthogonal  to  both 

and  7s  under  che  measure  3ut  this*  is 

generically  possible  because,  for  fixed  z  •=  *R. 
can  be  made  a  pre-Hilbert  space  under  the  inner 
oroduct 


(3.12) 


g(z,y)  »  llz  -  22zy  +■  60  (3.12) 

which  is  unique  up  co  a  multiplicative  term,  which 
may  be  a  function  of  z,  within  che  class  of  func¬ 
tions  affine  in  y.  The  corresponding  Q(z,y)  becomes 

Q(z,y)  •  (llz2  -  ZZzy  +  60)/18  .  (3.13) 

la  order  co  illustrate  the  robustness  properties 
of  (3.13),  che  optimum  reaction  of  che  follower  is 
plotted  againsc  B]_  in  Fig.  1.  The  solid  line 
represents  the  follower's  optimum  reaction  when^ 
g(z,y)  »  constant,  with  a  corresponding  Q(z,y)*j. 
The  dashed  line  represents  the  follower's  opcimum 
reaction  to  the  robust  strategy  (3.13)  when 
varies  about  Bj«l.  At  B^"B*,  we  have  v-vc»0.1  for 
both  strategies.  However,  when  B^  varies  about  1, 
the  opcimum  reaction  induced  by  (3.13)  is  consider¬ 
ably  closer  co  vc  compared  with  Che  case  .  [In 
chis  discussion,  as  well  as  in  che  figure,  <Je  have 
eaten  z»0 . 3 ;  though  the  behavior  is  similar  for 
other  values  of  z.I 


2)  Theorems  2  and  3  are  extended  co  che  case  when 
che  parameter  i^A  is  a  vector,  and  it  is  shown 
chat  che  statements  of  these  two  theorems  remain 
(basically)  intact,  as  long  as  a  is  finite- 
dimensional.  In  this  case  (2.13)  and  (2.19)  are 
each  replaced  by  p  »  dlm(a)  equations,  one  for  each 
component  of  a. 

3)  These  results  are  also  extended  to  che  case 
when  U  and  V  are  finite  dimensional  Euclidean 
spaces . 
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One  would  be  tempted  to  think  chat  the  robustness 
properties  of  0(z,y)  would  be  further  enhanced  if 
we  chose  3(z,v)  co  be  also  orchogonal  co  fo(z,y) 
of  (2.19),  which  In  Chis  case  becomes 

f,(z,y)  •  ( — z  y)z  (3.14) 

However,  since  -z  +  y  Is  a  linear  combination  or 
fj,(z,v)  and  F(z,y),  any  g(z,y)  orthogonal  to  f ^ 
and  fi  is  also  orchogonal  co  F,  therefore,  such  a 
g(z,y)  would  not  satisfy  (2.9),  thus  ruling  out 
che  possibility  of  obtaining  more  robust  strategies 
using  che  cheory  developed  in  Seccion  II. 


IV.  EXTENSIONS 

In  chis  section  we  discuss  some  extensions  of  the 
foregoing  results,  which  have  been  discussed  in 
l  3  i  . 

1)  Higner  )than  1)  order  sensitivity  functions 
have  been  considered  and  it  has  been  shown  chat  for 
some  classes  of  loss  functions  these  can  all  be 
made  identically  zero  by  a  proper  choice  jf  g’ST** 
in  (2.3),  which  implies  that  in  these  cases  che 
linear  'in  •>)  Incentive  policy  (2.3)  is  not  onlv 
locally  insensitive,  but  also  globally,  i.e.,  it 
is  a  robust  policy. 


ui  r  n 


Februarv  198^ 


OPTIMUM  OR  NEAR-OPTIMUM  INCENTIVE  POLICIES  FOR  STOCHASTIC  DECISION  PROBLEMS 
IN  THE  PRESENCE  OF  PARAMETRIC  UNCERTAINTY7 


by 


Derya  H.  Cansever  and  Tamer  Basar 
Decision  and  Control  Laboratory 
Coordinated  Science  Laboratory 
University  of  Illinois 
1101  W.  Springfield  Avenue 
Urbana,  Illinois  61801,  U.S.A. 


I  Address  for  Correspondence 

I  Professor  Tamer  Ba^ar 

|  Coordinated  Science  Laboratory  ' 

University  of  Illinois 
!  1101  U.  Springfield  Avenue 

|  Urbana,  Illinois  61801,  U.S.A.  j 


Research  reported  herein  was  supported  in  part  bv  the  Office  of  Naval  Fesearo  : 
under  Contract  N0001-*-8A-R-0A69 ;  and  in  part  bv  the  U.S.  Department  of  Enercv . 
lectric  Energy  Systems  Division,  under  Contract  DE-ACO 1-3 IRA- 3 Oh 68  ,  with  '.vnamic 
ys terns,  ?.  0.  Box  A23,  Urbana,  Illinois  61S01. 


i 


ABSTRACT 


In  this  paper  we  consider  a  general  class  of  stochastic  incentive 
decision  problems  In  which  the  leader  has  access  to  the  control  value  of  the 
follower  and  to  private  as  well  as  common  information  on  the  unknown  state  of 
nature.  The  follower's  cost  function  depends  on  a  finite  number  of  parameters 
whose  values  are  net  known  accurately  by  the  leader,  and  in  spite  of  this 
par"~atriC  uncertainty  the  leader  seeks  a  policy  which  would  induce  the  desired 
behavior  on  the  follower.  We  obtain  such  policies  for  the  leader,  which 
are  smooth,  induce  the  desired  behavior  at  the  nominal  values  of  these  parameters, 
and  furthermore  make  the  follower's  optimal  reaction  either  minimally  sensitive 
or  totally  insensitive  to  variations  in  the  values  of  these  parameters  from  the 
nominals.  The  general  solution  is  determined  by  some  orthogonality  relations 
in  some  appropriately  constructed  (probability)  measure  spaces,  and  leads  to 
particularly  simple  incentive  policies.  The  features  presented  here  are 
intrinsic  to  stochastic  decision  problems  and  have  no  counterparts  in  deter¬ 
ministic  incentive  problems. 

Keywords:  Stochastic  systems;  economic  systems;  team  theory;  decision  theory; 

game  theory;  optimization;  Stackelberg  games. 
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I.  INTRODUCTION 

In  this  paper  we  consider  a  general  class  of  stochastic  Stackelberg 
game  problems  (equivalently,  incentive  decision  problems,  in  our  context)  in 
which  the  leader  has  access  to  the  control  value  of  the  follower  and  to  private 
as  well  as  common  information  on  the  unknown  state  of  nature,  whereas  the 
follower  has  access  to  only  common  information  which  is  shared  by  the  leader. 

It  is  further  assumed  that  the  follower's  cost  function  (which  is  strictly 
convex,  but  not  necessarily  quadratic)  depends  on  a  number  of  parameters 
whose  values  are  not  known  accurately  by  the  leader. 

As  it  has  been  noted  in  the  seminal  paper  by  Harsanyi  [1] ,  the  class 
of  games  with  uncertain  cost  functions  is  in  fact  one  of  the  three  main 
prototypes  in  which  a  game  with  incomplete  information  can  arise.  Incomplete 
information  is  intarperted  as  lack  of  full  information  on  the  part  of  the 
players  about  the  normal  form  of  the  game,  and  the  other  two  cases  which 
create  games  with  incomplete  information  are: 

(i)  Some  or  all  of  the  players  may  not  know  the  state  of  the  nature, 
or  the  outcome  of  its  evolution  as  a  function  of  their  decisions; 

(ii)  The  players  may  not  know  their  own,  or  the  other  players' 
strategy  spaces. 

It  is  shown  in  [1]  that  the  two  cases  cited  above  can  be  represented 
as  uncertainties  in  the  cost  functions,  so  that  games  with  any  type  of 
incomplete  information  can  be  treated  as  games  wherein  players  are  uncertain 
about  their  own  or  some  other  players'  cost  functions.  In  our  Stackelberg  game 
problem  with  two  players,  we  will  assume  that  both  players  have  exact  knowledge 
of  their  own  cost  functional,  except  for  the  state  of  the  nature  x,  about  which 
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players  have  imperfect  information,  with  the  follower's  information  on  x  being 
nested  into  the  one  of  the  leader’s.  We  will  also  assume  that  the  leader's 
lack  of  knowledge  about  the  follower's  cost  function  can  be  modeled  as  a  finite 
dimensional  parameter  vector  cc.  Under  the  assumption  that  each  player  knows 
his  cost  functional  exactly,  the  asymmetry  in  the  knowledge  of  a  can  affect 
the  leader's  cost  function  only  through  the  follower's  decision  variable, 
where  the  latter  is  the  outcome  of  an  optimization  problem,  given  the  actual 
values  of  a.  Under  the  adopted  information  structure  and  solution  concept, 
follower's  knowledge  on  the  cost  function  of  the  leader  is  irrelevant  to  the 
analysis  to  follow. 

* 

We  let  the  leader  have  a  prior  estimate  of  a,  denoted  by  a  ,  which 
will  henceforth  be  referred  to  as  a's  nominal  value.  Let  us  suppose  for  the 
time  being  that  the  leader  knows  the  actual  value  of  a;  then  the  outcome  of 
the  game  with  the  extra  information  from  the  part  of  the  leader  is  called 
the  first  best  solution.  The  objective  is  to  obtain  ''near-optimal"  incentive  policies 
(decision  rules)  for  the  leader  which  would  induce  the  first  best  solution, 
characterized  by  a  desired  behavior  (by  the  leader)  on  the  follower  when  the 
nominal  and  actual  values  of  a  coincide,  and  such  that  this  outcome  will  be 
minimally  sensitive  to  deviations  of  the  follower's  perception  of  a  from  its 
nominal  value. 

The  problem  will  be  formulated  in  mathematical  terms  in  Section  II. 

In  Section  III,  we  will  present  a  complete  solution  to  the  scalar  problem, 
which  possesses  the  main  characteristics  of  the  issue,  while  allowing  better 
lucidity  to  the  presentation.  In  Section  IV,  these  results  will  be  generalized 
to  the  case  where  the  decision  variables  and  uncertain  parameters  take  values 
in  finite-dimensional  Euclidean  spaces.  The  theory  developed  will  be  illustrated 


3 


via  a  numerical  example  motivated  by  a  problem  that  arises  in  the  control  of 
large  organizations  [6],  in  Section  V.  Concluding  remarks  of  Section  VI  end 

Che  paper. 
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II .  PROBLEM  FORMULATION 

Let  (Q,F,P)  be  an  underlying  probability  space  on  which  three  random 
variables  x,  y  and  z  are  defined;  where  x€  X  denotes  the  random  state  of  the 
nature,  and  z£Z  and  ySY  denote,  respectively,  common  (to  both  players)  and 
private  (to  the  leader)  information  related  to  x.  We  also  let  U  and  V  be  the 
decision  spaces  of  the  leader  and  the  follower,  respectively.  We  are  assuming 
at  this  point  that  X,  Z  and  Y  are  endowed  with  sufficiently  rich  topology  so 
that  probability  measures  can  be  defined  on  their  subsets.  Let  F,  denote  the 
class  of  all  mappings  from  Z  into  V  (satisfying  some  smoothness  conditions  to  be 
delineated  later)  ,  and  F  be  the  class  of  all  mappings  from  V  x  Z  x  Y  into  U. 
Furthermore  let  F  be  the  class  of  mappings  from  ZxY  into  U.  We  will  denote 

3 

generic  elements  of  F  ,  F^  and  T  by  u,  v  and  y,  respectively,  so  that  u  =  u(z,y), 

t  t  s 

v  =  v(z) ,  y  =  y(u,z,y).  Let  us  assume  that  there  is  a  point  (u  ,v  )  in  F  xF^ 
which  is  most  desirable  by  the  leader  (such  as  a  system  trajectory  or  control 
trajectory)  and  he  seeks  to  determine  a  policy  y £F  which  would  force  the 
follower  to  such  an  action  that  the  resultant  outcome  (u,v)  in  the  product 

3  £  t 

space  F  xFp  is  sufficiently  close  to  (u  ,v  ) ,  the  first-best  solution,  by 
also  taking  into  consideration  the  fact  that  the  cost  function  of  the  follower 
is  not  known  accurately. 

Let  L(u,v,x,a)  denote  the  loss  function  of  the  follower,  depending 
on  a  set  of  parameter  values  a€/A.  Let 

J(u , v , a)  =  E{L(u(z,y) ,v(z) ,x,a) } 

where  E  is  the  expectation  over  the  statistics  of  the  random  variables  x,  z  and 
y.  (Note  that  we  have  abused  the  notation  here  and  have  used  the  same  notation 

I 
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for  both  random  variables  (or  vectors)  and  their  realizations.)  Likewise  we 
introduce 


J(Y,v,a)  =  E(L(y(v(z) ,z,y) ,v(z) ,x,ot) ) 

which  we  call  the  cost  function  of  the  follower.  We  assume  that,  for  each  a€/A 
and  x^X,  L(u,v,x,a)  is  strictly  convex  in  the  pair  (u,v)£T  xr,,  and  furthermore, 
:  is  structures  so  that  for  each  r and  a€/A,  J(y,v,a)  is  strictly  convex  in  v. 

Under  these  assumptions,  to  each  y€r,  and  for  fixed  a€/A,  there 
corresponds  a  unique  element  of  T.,  called  vY,  which  will  minimize  J(y,v,i)  over 
v€  "  .  [We  are  assuming  here  that,  as  a  rational  decision  maker,  the  follower 
chooses  his  optimal  policy  (or  rational  reaction)  by  minimizing  the  function 
J(v,v,a)  where  a  is  the  true  value  characterizing  his  cost  functional.]  Hence, 
we  have  the  unique  correspondence  (for  each  fixed  a€/A) 

5' 

a 

v  - -  v 

where  the  mapping  5 '  depends  explicitly  on  a  and  involves  a  minimization 
operation. 

Let  us  further  note  that  this  unique  relationship  necessarily  yields 
s 

a  unique  element  in  T  ,  given  by 

y(z,y)  =  y(v(z) ,z,y) 

and  hence  we  can  associate,  with  each  Y~”,  a  unique  pair  (u,v): 

Sa 

Y  - -  (u,v) 

where  the  mapping  is  denoted,  in  this  case,  by 
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This  is  the  familiar  input-output  relationship  of  system  theory  (with 
the  mapping  5^  being  much  more  complicated  than  the  one  normally  encountered  in 
system  theory)  and  hence  we  can  call  F  the  input  space  UxV  the  output  space. 

In  terms  of  the  familiar  jargon  of  system  theory  we  can  rephase  the  problem 
posed  in  the  earlier  part  of  this  section  as  follows: 

Problem  A 

* 

For  a  given  nominal  value  a  S/A,  find  an  input  (control)  y  which  would 
drive  the  system  output  to  a  desired  value  (uC,vC).  [Note  that  the  desired 
output  is  in  fact  a  stochastic  variable  (or  process) ;  hence  what  we  pose  here 
is  akin  to  stochastic  controllability.] 

The  solution  to  Problem  A  has  in  fact  been  obtained  in  [2];  an  important 
feature  is  that  it  is  generally  nonunique,  even  in  the  class  of  policies  (for  the 
leader)  which  are  linear  in  v.  This  then  prompts  a  second  question  (additional 
design  criterion)  related  to  the  sensitivity  properties  of  the  "optimal"  y: 

Problem  B 

Among  the  control  inputs  which  solve  Problem  A,  which  one  (ones)  leads 
(lead)  to  outputs  that  are  least  sensitive  to  variations  in  the  V3iue  of  a  from 

k 

the  nominal  value  a  ? 

In  this  paper  we  obtain  a  complete  solution  to  this  problem  by  using 
some  ideas  originally  introduced  in  [3]  for  the  deterministic  version  of  the 
problem.  The  features  of  the  solution  for  the  stochastic  problem  are,  however, 
inherently  different  from  those  of  the  deterministic  problem,  which  will  be 
pointed  out  as  we  go  along. 
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III.  MAIN  RESULTS  FOR  THE  SCALAR  PROBLEM 

In  this  section  we  assume  that  the  spaces  U,  V,  X,  Z  are  1-dimensional 

Euclidean,  and  a  is  a  single  parameter  with  nominal  value  (as  perceived  by  the 

*  * 
leader)  a  .  The  actual  value  of  a  may  be  in  a  small  neighborhood  of  a  ,  and  we 

assume  that  L(u,v,x,a)  is  strictly  convex  in  (u,v),  and  twice  continuously 

differentiable  in  u,  v  and  a,  when  a  is  restricted  to  lie  in  this  neighborhood. 

Furthermore,  we  assume  that  every  y€7  Is  continuously  differentiable  in  v. 

The  random  variables  x,y,z  are  jointly  second-order  random  variables,  and  y,  in 

addition  to  being  continuously  differentiable  in  v,  is  measurable  in  z  and  y, 

and  u.v  are  also  measurable  in  their  arguments.  Furthermore,  if  L  is  measurable 

in  x,  the  expectation  of  L,  written  as 

E{L(v(v(z) ,z ,y) ,  v(z) ,x,a) }  ,  (3.1) 

is  well-defined  for  all  yeT,  uS  v£7  ,and  for  every  a  in  a  neighborhood  of 

*  s 

a  .  We  finally  assume  that  a  desirable  point  in  the  product  space  7  x7^,  as 

chosen  by  the  leader,  is  (uC,vC). 

* 

If  a  indeed  takes  the  value  a  ,  the  results  of  [2],  when  specialized 
to  this  scalar  case,  indicate  that  this  point  (ut,vC)  is  induced  by  the  leader 
by  a  linear  (in  v)  policy 

Y (v , z ,v )  =  uC(z,y)  -  Q(z,y)  [v  -  vC(z)]  (3.2) 

whenever 

E  (L  (uC(z,y)  ,vC(z>  ,x,a  )}  +  0  1')  (3.3) 

x/y,z 


~^Kere  L,  is  the  partial  derivative  of  L  with  respect  to  u,  and  E  denotes  the 
u  , 

v  /  «  *  -* 

conditional  expectation  over  the  statistics  of  x  given  (y,z; .  ‘  • 
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with  positive  probability  in  (y,z).  Since  we  have  a  simpler  (scalar)  problem 
here,  the  equation  for  Q  can  be  obtained  directly.  Substituting  (3.2)  into 

(3.1)  we  have 

E  {  E  {L(uC  -  Q[v  -  vt],v,x,a  )}}  (3.4) 

z  x,y/z 

which  is  strictly  convex  in  v  since  L  was  strictly  convex  in  (u,v)  and  y  is 
linear  in  v.  Hence  (3.4)  admits  a  unique  minimum  in  which  is  obtained  by 
differentiating  the  inner  expression  with  respect  to  v,  for  fixed  z,  and 
setting  equal  to  zero: 

E  (L  Q  -  L  }  =  0 
,  u^  v 
x,y/z 

E  (Q(z,y)  E  iLu(ut  -  Q[v  -  v C ] , v , x , a  )}}  (3.5) 

y/z  x/y ,z  u 

=  E  (Lv(uC  -  Q [v  -  vC] ,v,x,a*) } 
x,  y/z 

This  is  the  equation  that  the  minimizing  v  would  satisfy  (this  is  also 
sufficient  because  of  strict  convexity),  and  it  is  in  general  nonlinear  in  v. 
However,  we  do  not  need  to  solve  this  for  v,  but  simply  find  a  Q  such  that  its 
unique  solution  is  vC  —  which  would  in  turn  yield  y(vt,z,y)  =  uC,  in  view  of 

(3.2) ,  and  thereby  the  desired  solution  would  be  induced.  N’ow,  substituting 
v  =  vC  in  (3.5)  we  obtain 


1 


r 

V/ 2 


(Q(z,y)F(z,y) } 


G(z) 


(  3  .  h  ) 
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where 

F(z,y)  =  E  {L  (ut(z,y) ,vt(z) ,x,j  )t 
x/y,z  U 

I  1 

G(z)  =  E  {Lv(ut(z,y),vt(z),x,i  ); 
x,y/z 

If  this  had  been  a  deterministic  problem,  the  solution  to  <3. ft'  vou 
be  unique,  thus  outruling  any  possibility  of  obtaining  optimal  policies  tnat 
satisfy  other  design  criteria.  For  the  stochastic  problem,  however,  (3.h; 
admits  infinitely  many  solutions,  with  a  family  of  such  solutions  being  I'vruc 
turns  out  to  be  sufficiently  rich) : 

Q(z.y)  =  g(z,y)G(z)/[  E  {g(z,v)F(z,y)  r ]  <i 

v/z 

where  g  is  any  function  measurable  in  (z,y)  satisfying  the  condition 

E  (g(z,y)F(z,y) }  #  0  ,  z  e  jR  v  3 

y/z 

Verification  of  this  result  is  by  direct  substitution  of  (3.8)  into  (3.6). 

Sow,  let  F  denote  the  sigma-algebra  generated  by  z,  and  F^,^  be 
the  sigma  algebra  generated  by  z  and  v.  The  nonuniqueness  of  Q(z,y),  as 
described  by  (3.8),  stems  from  the  fact  that  it  is  not  measurable  with  respec 
to  F  ,  the  sigma  algebra  generated  by  the  information  acquired  by  the  followe 
More  precisely,  since  F^,,v3F  every  aiom*''*  of  F^  is  a  union  of  atoms  of  F  v  _ 
and  the  collection  of  atoms  of  F^  gives  a  coarser  partition  of  than  the 
corresponding  collection  from  ?z^v  [4).  Considering  (3.6).  the  defimnc 


*>  \  ** 

“'A  is  called  an  atom  of  a  sigma  algebra  F  if  A£r,  ar.d  no  subset  o-t  A  ht-lnnc 
to  F  other  than  A  itself  and  the  empty  set. 
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equation  for  Q(z,y),  the  F  -measurable  function  Q(z ,y ) F( z ,y ) ,  assumes  a  single 

zVy 

value  on  an  atom  of  F  .  As  each  atom  of  F  is  composed  of  a  union  of  atoms 

z  Vv  z 

of  F  ,  Q(z,y)F(z,y)  will  in  general  take  more  than  one  value  on  every  atom  of 
zVy 

F  .  When  the  conditional  expectation  is  taken,  Q(z,y)F(z,y)  is  averaged  over 

the  atoms  of  F^  to  yield  G(t)  for  given  F(z,v).  Clearly,  infinitely  many 

functions  may  yield  the  same  average,  such  as  the  family  characterized  by  (3.8). 

We  note  that  the  nonuniqueness  is  further  pronounced  by  the  presence  of  the 

F  ^  -measurable  function  F(z,v).  Should  F(z,v)  be  F^-measurable ,  then  the 

nonuniqueness  of  Q(z,v)  as  in  (3.8)  drops  out  when  the  expected  value  of  Q(z,v) 

conditioned  on  F  is  computed.  However,  when  F(z,y)  is  not  F  -measurable,  then 
z  z 

this  nor.uniqueness  is  genuine,  in  the  sense  that  it  remains  to  hold  true  even 
after  tne  expected  value  of  Q(z,y)  conditioned  on  F^  is  computed,  an  operation 
performed  bv  the  follower  when  computing  his  optimum  reaction  to  an  announced 
strategy . 


We  now  summarize  the  result  on  the  nonuniqueness  of  Q(z,y)  below: 
Theorem  1.  Problem  A  formulated  in  Section  I  admits  for  the  scalar  version, 
infinitely  many  linear  (in  v)  solutions,  with  one  such  family  given  by  (3.2), 

(3.8) ,  under  condition  (3.9). 

In  view  of  this  nonuniqueness  feature  of  the  solution  to  Problem  A, 
Problem  B  becomes  relevant  which  we  address  in  the  sequel.  Towards  this  end, 
let  us  assume  that  the  leader  adopts  the  policy  (3.2)  with  Q  chosen  as  in 

(3.8)  and  g  being  arbitrary  (but  satisfying  (3.9)).  For  any  such  g,  this  is 

t  t  *  :V 

an  optimal  policy  inducing  (u  ,v  ),  provided  that  1=1  .  If  x^i  ,  however,  the 
follower's  reaction  to  y  will  no  longer  be  v=vC .  In  fact,  substituting  (3.2) 

A 

into  (3.4)  with  1  replaced  by  a  general  1,  and  differentiating  the  resulting 
expression  with  respect  to  v,  we  obtain  (to  replace  (3.5)) 


i 
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E  (Q(z,y)  E  {L  (uC  -  Q[v  -  vC],v  , x ,a)} i 
y/z  x/y,z  U 

(3.10) 

=  E  ^Lv(uC  -  Q[va  -  vC] ,va,x,a) } 
x,y/z 

★ 

which  admits  a  unique  solution  va(z)  when  a  lies  in  a  neighborhood  of  a  (because 
of  strict  convexity  of  L) .  This  solution  is  not  obtainable  explicitly,  unless 
we  specify  a  structure  for  L  (such  as  quadratic);  however,  we  in  fact  do  not 
need  an  explicit  expression  for  v^(z),  as  the  following  discussion  reveals. 

The  solution  v^Cz)  to  (3.10)  will,  in  general,  depend  on  different 
choices  of  Q  out  of  the  family  (3.8)-(3.9).  What  Problem  B  alludes  to,  is  a 
choice  which  will  render  the  difference  |v^(z)  -  vC(z) j  sufficiently  small  (in 
norm)  whenever  is  close  to  a  .  Note  that,  if  va(z)  is  close  to  v  (z)  =  v  A(z), 
then  u^(z,y)  =  y(v^,z,y)  will  be  close  to  uC(z,y)  *  u_^(z,y),  because  of 

continuity  properties  of  y.  Hence,  as  a  measure  of  the  closeness  of  v^(z)  to 

t  * 

vw(z),  we  now  take  the  first  order  term  dv  /dt  and  evaluate  it  at  .t=*i  . 

Ct 

Since  (3.10)  is  an  identity  for  all  a  of  interest,  we  could  differentiate 

■x. 

it  with  respect  to  a  (for  each  fixed  z),  to  obtain  the  equality  (at  a=  i  ): 

E  (Q(z,y) (dvC/da)  E  {L  (uC  ,vC  ,x,a*)  }  -  Q“(z,y)  (dv'Vda) 
y/z  x/y,z  UV 

c  t  *  ,  t  t  *  ,  , 

E  IL  (u  , v  ,x,a  );+Q(z,y)  E  {L  (u  ,v  ,x,a  );• 

x/y,z  x/y,z 

=  E  {(dvC/da)Lvv(ut,vC,x,a  )  -  Q(z  ,v)  (dvt/da)Lvu(ut  .v11  ,x,a  ) 
x,v/z 

,  j  ,  t  t  *  , 

+  L  (u  ,v  ,  X ,  a  )  ; 

71 

dvC(z)/aa  =  dv  (z)/da:  ^  0.11) 

J.  ! 

,  a 


where 
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Since  dvVdct  is  F  -measurable. 


we  can  easily  solve  for  it  to  obtain 


x>y/z 


(L  Q  -  L  } 
ua  va 


{L  Q  -  2L  Q  +  L  } 
uu  uv  w 


x,y/z 


(3.12) 


t  t 

where  Lua>  ^-vct>  Luu’  Luv  a^  ^ave  >v  >a  )  as  their  arguments.  Note  that  since 
L  was  strictly  convex  in  (u,v),  the  denominator  of  (3.12)  is  always  positive  and 
hence  dvC/da,  which  we  will  henceforth  call  the  first  order  sensitivity  function, 
is  well-defined. 

Now,  the  first  objective  is  to  make  this  expression  identically  zero, 
by  an  appropriate  choice  of  g(z,y)  in  (3.8).  Substituting  (3.8)  into  the 
numerator  of  (3.12)  and  reshuffling  some  terms,  we  obtain  the  condition 


E  tg(z,y )£  (z,y)}  -  0 
y/z 

where 

A  t  t  * 

f1(z,y)  =  F(z,y)  E  (L^u  ,v  ,x,a  )  .- 
x,v/z 


-  G(z)  E  { L  (ut,vt,x,a*)} 
/  ua 

x/y,z 


Now  let 


(3.13) 


(3.14) 


rS  =  {g€rS  :  g  satisfies  (3.9)}  .  (3.15) 

Then  it  follows  from  the  above  discussion  that  if  there  exists  a  g€  7^  ’ 
satisfying  (3.13),  the  corresponding  policy  for  the  leader  renders  the  first- 
order  sensitivity  function  zero,  i.e.,  to  first-order  v  (and  ccnseauer.tlv  u  ) 

1  X 

becomes  insensitive  to  variations  in  the  value  of  a,  from  the  nominal  value  ;  . 


1 


13 


g 

But,  such  a  g  -  T  exists  generically — choose  any  random  variable  that  is 
orthogonal  to  f^€r  under  the  conditional  probability  measure  P(y/z),  but 
not  orthogonal  to  F(z,y),  which  will  be  possible  as  long  as  F  is  not  linearly 
dependent  on  f^(z,y).  This  holds  true  if  and  only  if  we  have,  for  every 

k(-)erf, 

E  {L  (ut,vt,x,a  )}  ^  k(z)  E  {L  (uC,vt,x,a  )}  .  (3.16) 

<  UOC  /  u 

x/y ,2  x/y ,z 


A  candidate  solution  to  (3.13)  is  then, 


g(z,y)  =  E  (g  (z,y)f  (z,y)}  -  g,(z,v) 

y/z 


E  {£, 
y/z  _L 


(z  ,y) } 


(3.17) 


which  can  easily  be  shown  to  satisfy  (3.13)  for  any  -measurable  g  (z,y). 

Hence,  we  in  fact  have  a  family  of  solutions  parameterized  by  g  (z,y): 

Theorem  2 .  Let  (3.3)  and  (3.16)  be  satisfied.  Then,  the  first  order  sensitivity 

function  dvVdct  can  be  made  identically  zero  by  an  incentive  policy  of  the  form 

s 

(3.2)  where  0  is  given  by  (3.8)  and  g€  r  satisfies  (3.13),  with  one  family  of 
candidates  being  (3.17). 

Proof .  The  proof  follows  by  construction,  from  the  discussion  preceding  the 
theorem.  = 


It  is  shown  in  Appendix  A  that  an  expression  for  the  second-order 
sensitivity  function 


,2  t,  . 
d  v  (z) 

? 

da“ 


“2vz)j 

T  i  * 

da~ 


I  3  . 1  c ) 


is  given  by 
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.2  c  1 

=  E  (L  Q  -  L  }/  E  {L  Q  -  2L  Q  +  L  } 
.  2  ,  uaa  vaa  ,  uu  uvx  w 

aa  x,y/z  x,y/z 


(3.19) 


,s  , 


which  can  be  made  zero  if  and  only  if  g(z,y)eT  in  (3.8)  satisfies  [as  a 
counterpart  of  (3.13)]: 


E  {g(z,y)f  (z,y)}  =  0 
y/z 


(3.20) 


where 


f,(z,y)  =  F(z,y)  E  {Lvaa(uC,vt,x,ai*) } 
x,y/z 

(3.21) 

-  G(z)  E  {L  (uC,vC,x,3*)} 

.  ucxct 

x '  y ,  z 


Hence,  for  both  first  and  second-order  sensitivity  functions  to  be  identically 

s-  s 

zero,  it  is  sufficient  to  find  a  g6  r  which  is  orthogonal  to  both  f1  £  T  and 
f ,,  €  7  under  the  measure  P(y/z)  .  But  this  is  generically  possible  because,  for 
fixed  z€=IR,  7s  can  be  made  a  pre-Hilbert  space  under  the  inner  product 

<g,f>  =  j  g(z,y)f(z,y)dP(y/z) 

]R 


with  g ,  f  s  r  ,  where  P(y/z)  is  a  probability  measure.  In  order  to  insure  that 
—  s  ” 

there  exists  a  gST  orthogonal  to  both  f ^  and  f,,,  we  have  to  assume,  in 
addition  to  (3.16),  the  validity  of  the  condition 


x/y,z 


{Luua(u  ’V  )} 


k(z) 


i L  (u  ,v  ,x,  t  )  ' ,  rk( • ) €  . 
u  t 

z 


(3.22  » 


15 


This  then  leads  to  the  following  theorem: 

Theorem  3.  Let  conditions  (3.3),  (3.16)  and  (3.22)  be  satisfied.  Then,  for  the 

scalar  stochastic  incentive  problem  of  this  section,  there  exists  an  incentive 

t 

policy  for  the  leader  which  induces  the  follower  to  play  v  =  v  when  a=>a 

(the  nominal  value)  and  furthermore  makes  the  sensitivity  functions  of  orders 
t  2  t  2 

1  and  2  (i.e. ,  dv  /da  and  d“v  /da  )  identically  zero  a.e.  P(y/z).  Such  a  policy 
is  given  by  (3.8)  where  g6  TS  satisfies  (3.13)  and  (3.20). 

Proof .  (3.16)  and  (3.22)  guarantee  that  f^  and  f ^  do  not  linearly  depend  on 

F(z,y).  Without  loss  of  generality,  let  us  assume  that  f^  and  f^  are  linearly 

g  “ 

independent.  Then,  there  exists  an  orthonormal  system  (e  .e^)  in  F  spanning 

g“ 

the  same  subspace  as  (f^.f^)  (5].  Now  an  e^sF  orthogonal  to  both  e^  and  e^ 
can  be  constructed  using  Gram-Schmidt  orthogonalization  procedure,  a.e.,  P(v/z). 
This  e^  is  the  desired  g.  ° 

The  lines  that  led  to  the  proof  of  Theorem  3  suggest  that  higher  order 
sensitivity  functions  can  be  annihilated  using  the  same  approach.  Towards  this 
end,  let  us  assume  that  L(u,v,a)  is  N  times  differentiable  in  a,  where  N  is  an 
arbitrary  large  positive  integer.  Let  N  denote  the  index  set  {1,2,...,N;. 

Then,  the  n'th  order  sensitivity  function  is  defined  as 


Our  objective  is  to  annhilate  the  above  expression  for  all  nGN. 

This  problem  alludes  to  rendering  the  N'th  order  Taylor  approximation  of  v  ( z i 

'll 

sufficiently  close  to  vC(z).  Indeed,  let  the  true  value  of  x  be 

* 

x  =  x  -1-  z  (3.21) 
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where  £  is  a  sufficiently  small  real  number.  The  N'th  order  Taylor  approximation 
* 

of  v  (2)  about  a  is 

Cl 

N  n  dnv  (z) 

v(z)  =  v(z)  +  I  - V- -+o(e)  (3.25) 

a  .  n'  ,  n 

n=l  n-  da 


where  vC(z)  is  the  first-best  solution  of  the  incentive  control  problem.  To 
make  va(z)  as  close  as  possible  to  the  first-best  solution,  we  will  choose 


Q(z,y)  such  that  (3.23)  vanishes  for  all  n^S.  It  is  shown  in  Appendix  A  that 


,n  t 
d  v 

.  n 

da 


x.y/z 


{L  nQ  -  L  n} 
uan  van 


{L  Q“  -  2L  Q  +  L  } 
uu  uv  vv 


(3.26) 


3) 


x,y/z 


this  expression  can  be  made  zero,  if  and  only  if  gS  T  in  (3.8)  satisfies 


E  (g(z,y)f  (z,y)}  =  0  (3.27) 

y/z 


where 


fn(z’y) 


F(z,y)  E 

x,y/z 


r  _  .  t  t  * 

lLva«(u  ,v  ,x,a 


)} 


\  « 


G(z)  E  {Luan(uC ,vC ,x,a  )},(3.28) 
x/y  ,z 


and 


E  {L 


x/v,: 


ua1 


*  *  * 
.(u  , v  ,x,a  )} 


k(z)  E  {L  (uC, 
x/y,z 


t  *.  , 

v  ,x,a  ) ; 


Vk(-)€ 


(3.29) 


where  (3.28)  is  a  necessary  and  sufficient  condition  for  the  linear  independence 
of  fn  from  F(z,y),  and  f  is  the  counterpart  of  (3.14)  and  (3.21)  specialized 

for  the  n'th  order  sensitivitv  function.  The  fact  that  F  is  a  pre-Hilbert 

s 

space  enables  us  to  prove  the  following  theorem. 


3) 


Here  L 


is  the  n'th  order  partial  derivative  of  L  with  respect  to  1 . 
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Theorem  4 .  Let  conditions  (3.3)  and  (3.29)  be  satisfied  for  all  n^N.  Further 
let 

* 

a  »  a  +  e 

where  e  is  sufficiently  small.  Then,  there  exists  an  incentive  policy  for  the 
leader  which  induces 

v  (z)  =  vC(z)  +  o(sN)  a.e.  P(v/z) 

Ct 

where  vC(z)  is  the  first-best  solution  of  the  problem,  and  N  is  an  arbitrarily 

g"" 

large  finite  positive  integer.  Such  a  policy  is  given  by  (3.8)  where  g£F 
satisfies  (3.27)  for  all  n€Jl. 

Proof .  Under  (3.29),  f^  is  linearly  independent  from  F(z,y),  so  that  (3.9) 

N  s“ 

is  not  violated.  Let  S‘  be  the  subspace  of  7  spanned  by  {f  ,  nSS),  Using 

the  Gram-Schmidt  orthogonalization  procedure,  one  can  construct  an  e,.,,-=rs 

.v+1 

N 

orthogonal  to  S*  a.e.  P(y/z).  This  e  ,  is  the  desired  g.  ° 


IV.  GENERALIZATIONS  TO  THE  VECTOR  CASE 


In  Che  previous  section,  we  confined  our  analysis  to  the  case  where 
/A,  U  and  V  are  one-dimensional,  mainly  not  to  obscure  the  main  ideas  with 
cumbersome  notation.  In  this  section,  we  will  let  these  spaces  be  finite¬ 
dimensional  Euclidean,  and  show  that  the  results  of  Section  III  can  be  generalized 
to  the  vector  case  as  well.  The  first  step  towards  this  goal  is  to  let  /A  be  a 
subset  of  ]Rr,  and  obtain  a  counterpart  of  Theorem  4  for  this  vector-parameter 

k 

case.  Now,  let  the  actual  value  of  the  parameter  a  be  related  to  a  through 

a  =  a*  +  e  ,  ]Rr  ,  a  S/A  .  (4.1) 


Then  the  N'th  order  Taylor  approximation  of  v^Cz)  around  a  for  e  sufficiently 
small  is 


N  d"{v  (z)} 

v  (z)  =  vC(z)  +  I  -S— 7 - +  o(|kp  (4.2) 

a  n!  11  11 


where 


Dnv  (z) 
e  a 


(elDl  + 


+  £rVn{va(z)} 


( -  .  3 ) 


and  is  the  partial  differential  operator  with  respect  to  the  i'th  component 
of  the  r-dimensional  vector  a,  acting  on  v  (z).  Similarly,  is  the  i'th 
component  of  c.  We,  therefore,  have  to  find  a  Q  orthogonal  to  D^v^(t)  for  all 
permissible  e  €  ]Rr  and  nGJJ.  This  is  accomplished  if  we  take  0  to  be 
orthogonal  to  the  set  of  vectors  defined  by 


i 
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jl  jr 

D1  ‘  ‘  "Dr  {V2)i 


(4 ..  4a) 


such  that  +  ...  jr  =  n  , 


(4.4b) 


where  z+  is  the  set  of  positive  integers. 

Let  us  assume  that  L(u,v,x,a)  is  sufficiently  smooth  so  that  the 
expressions  given  in  the  sequel  are  well-defined.  An  expression  for  the  i ’ th 
component  of  the  vector  valued  n'th  order  sensitivity  function  is  then  given 
bv 


ji  j_  ii  J  r  -i-i  i  r 

{D ^...D  rv  (*>}  =  E  (D11...Drr(Lu}Q  -  . .  .Dr  iLv}}  / 

x,y/z 


(4.5) 


E  (L  Q“  -  2L  Q  +  L  }  .  V'j  .  £  2'  such  chat  j .  +  .  . .  +j  =  n 
,  uu  uv  w  l  1  r 

x,y/ z 


For  a  finite  N,  there  will  be  a  finite  number  of  these  vectors  in  (4.2). 


n  - 

It  is  therefore  possible  to  find  a  Q  orthogonal  to  every  term  in  1  D.  v-.  ( z ) ; 


n : 


under  some  linear  independence  conditions  to  be  delineated  in  the 
sequel.  As  a  counterpart  of  (3.27),  we  should  have 


E  {g(z,y)f  (z,v)}  =  0  ,  nSN 


(4.6) 


y/z 


where  the  components  of  the  vector  valued  function  f  are  given  by 


if  (z,y) } .  =  F(z,y)  E  { D  x. . .D  r{L  (ut , vC ,x, a*) r } 
n  x  /  1  r  v 

x,  y/z 


{  x .  7 ) 


j  j 

G(z)  E  {D/.-.D^fL  (ut,vtfx,a*)}}  ,  ¥j 
.  i  r  u  i 

x/y,z 


S  Z  such  that  j  +. . 


Lj  =  n . 
r 
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It  is  possible  to  choose  such  a  g  provided  that  (fn(z,y)}^  is  linearly 
independent  from  F(z,y).  This  condition  is  characterized  by 

E  {L  (ut,vt,x,a  )}  t  k(z)  E  {D1 ^ — D  r{L  (uC,vC,x,a  )}} 
x/y,z  x/y,z 

\'4.8) 

Vk(  • )  £  ;  Vj  7£  such  that  j^+...+jr=n;¥n£N 

We  can  now  state  the  counterpart  of  Theorem  4  for  this  case  where  a  is 
a  vector.  Its  proof  is  along  lines  similar  to  the  one  of  the  previous  theorem, 
and  is  therefore  omitted. 

Theorem  5 .  Let  conditions  (3.3)  and  (4.8)  be  satisfied.  Let  also 

it 

a  =  a  +  e 

where  e  €  ]Rr  is  sufficiently  small.  Then,  there  exists  an  incentive  policy  for 
the  leader  which  induces 

v^(z)  ~  vC(z)  +  0 ( J - 1  )  a.e.  P(y/z) 

where  vC(z)  is  the  first-best  solution  of  the  problem,  and  N  is  an  arbitrarily 

g  — 

large  finite  positive  integer.  Such  a  policy  is  given  by  (3.8)  where  g€r 
satisfies  (4.6).  ° 

We  now  let  U  and  V  be  identical  to  IRm  and  K  ,  respectively.  In  this 
case,  the  defining  equation  for  the  (mxl)  matrix  0(z,y): 

E  {F(z  ,y)0(z,y) }  =  G(z)  (4.9) 

y/z 


can  be  rewritten  as  a  set  of  L  equations  given  by 
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E  {F(z,y)Q  (z,y)}  =  G.  ,  i&£  ,  (4.10) 

v/z  1 

£  -  { 1 , . . . , 4 }  ,  (4.11) 


where  Q^(z,y)  is  che  i'th  column  of  Q(2,y),  and  G^(z)  is  the  i'th  element  of  the 
( lx i )  vector  G(z) .  This  equation  alludes  first  to  a  Euclidean  inner  product 

s 

between  two  vectors  F  and  Q..  with  entries  in  P  ,  and  then  to  an  average  of  the 
resulting  quantity  over  the  atoms  of  7  to  yield  G ^(z).  Now,  if  we  arbitrarily 
assign  m-1  elements  of  Q^(z,y),  and  perform  the  operators  required  by  (4.10)  on 
these  arbitrarily  assigned  elements  and  the  corresponding  entries  of  F(z,y), 
and  transfer  the  resulting  F^-measurable  function  to  the  right  side  of  (4.10), 
we  are  left  with  an  equation  analogous  to  (3.6),  determining  the  remaining  entry 
of  Q  (z,y) .  If  the  corresponding  entry  of  F(z,y)  is  nonzero,  the  remaining 
entry  of  Q^(z,y)  admits  an  infinity  of  solutions  characterized  by  an  equation 
identical  to  (3.3).  We  now  summarize  this  result  below. 

Theorem  6 .  Let  F(z,y)  be  different  from  the  zero  vector  with  positive  probability 
in  (y,z).  Then,  any  (m-1)  elements  of  each  column  of  Q(z,y)  can  be  arbitrarily 
assigned,  provided  that  the  corresponding  entry  of  F  for  the  remaining  element 
of  that  column  is  nonzero.  There  exists  an  infinity  of  solutions  for  the 
remaining  entry,  characterized  by  (3.8),  with  F  and  G  of  (3.8)  being  properly 
identified.  c 

This  result  enables  us  to  characterize  the  family  of  solutions  to 
Q(z,y)  when  U  and  V  are  finite-dimensional  vector  spaces.  We  now  require  that 
?.-veccor  v  (t)  be  as  close  as  possible  to  the  first  best  solution  vC(z)  when 
the  actual  parameter  a  is  described  by 


a 


x 
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for  e€  ]Rr  being  sufficiently  small.  Then,  for  each  component  of  v^(z) ,  denoted 
by  vP(z) ,  where  p  belongs  to  the  index  set  £,  there  is  a  Taylor  expansion 
described  by  (4.2).  In  order  to  annihilate  all  the  sensitivity  functionals  up 
to  order  N  in  the  expansion  of  v^Cz),  we  need  to  choose  Q  orthogonal  to  the 
finite  set  of  functions  defined  by 

DJ11...DJrt(v?(z)} 

Vj  ^  S  such  that  j  ^  +  . .  .+j  «  n  ,  n€  21  ,  p  €  £ 


Let  L(u,v,x,a)  be  sufficiently  smooth.  Then,  an  expression  for  the 
components  of  the  numerator  of  the  n'th  order  sensitivity  function  is  given 
by 


num{D^1...D^V(z)}i  =  E  {D^1.  .  .D^r{F(z,y)  }Q  -  D^1 . .  .  D^r{G  (z)  > } 
1  x,y/z  p  p 


(4.12) 


Vj^.6  2  such  that  j1  +...+jr=n  ;  p€£ 


We  know  from  Theorem  6  that  m-1  elements  of  the  m-vector  Q  can  be  arbitrarilv 

? 

assigned.  Without  loss  of  generality,  let  us  make  them  equal  to  zero,  remaining 

with  a  nonzero  and  F  „  -measurable  element  Q  ,  for  some  positive  integer  s, 

zVy  ^ps  r  ^ 

less  than  or  equal  to  m.  This  Q  is  characterized  bv 

ps 


Q_„(z,y)  =  g  (z,y)G  (z)/[  E  (g  (z,y)F  (z,y) ; 
ps  ps  p  ^  ps  s 


(-.12) 


where  F  is  the  s'th  element  of  the  (Ixm)-vector  F(z,y),  and  g  (z,v)  is  a  F 

3  pS 

measurable  scalar  valued  function  satisfying 


E  ig  (z,y)F.(z,y)}  #0  ,  z  €  ]R 
y/z  ?S 
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£ 

where  e £  H  is  sufficiently  small.  Then,  there  exists  an  incentive  policy  for 
the  leader  which  induces 


va(z)  *  vC(z)  +o(je|N)  a.e.  P(y/z)  ,  vq(z)  IR2, 


where  v  (z)  is  the  first-best  solution  of  the  problem,  and  N  is  an  arbitrarily 


large  finite  positive  integer.  Such  a  policy  is  given  by  (4.13)  where  g  is 

ps 


orthogonal  to  the  functions  defined  by  (4.15). 


□ 
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V.  AN  ILLUSTRATIVE  EXAMPLE 

To  illustrate  the  theory  of  the  previous  sections,  we  now  consider  an 
incentive  design  problem  in  a  divisionalized  firm  [6],  [7J„  Many  large  firms 
are  organized  into  a  multidivisional  structure  with  interdependencies  among  them. 
For  instance,  when  divisions  compete  in  the  same  market,  or  when  one  division 
supplies  another  one  with  goods  or  services,  conflict  of  interest  among  divisions 
arise,  and  the  corporate  center  needs  to  coordinate  their  decisions  in  order  to 
maximize  the  firm's  overall  profit.  In  our  example,  we  focus  into  the  coordination 
of  a  division  director  (the  follower)  by  the  corporate  center  (the  leader).  Let 
xq  be  a  random  variable  representing  the  actual  state  of  the  firm,  consisting  of 
an  aggregation  of  its  profit  level,  its  market  share,  the  market  value  of  the 
shareowner's  equity,  and  the  like.  The  leader  and  the  follower  have  access  to  a 
noisy  measurement  of  xq,  denoted  by  z,  through  the  internal  reports  provided  by 
the  staff  of  the  corporate  center.  The  leader  has  also  access  to  a  private 
measurement  of  x^,  denoted  by  y,  provided  by  an  independent  market  research  firm. 
The  decision  variable  of  the  follower  is  its  effort  level  v,  which  is  also  observed 
by  the  leader,  and  the  decision  variable  of  the  leader  is  the  amount  of  centrally 
allocated  scarce  resources  and  is  denoted  by  u.  We  assume  that  the  state  evolves 
according  to 


x,  =  x  +  B.u  +  B-v 
1  o  1  2 


(5.1) 


where  B^  and  B9  represent  the  technology  of  transforming  resources  and  labor  into 
production.  Under  this  setup,  the  objective  of  the  follower  is  to  minimize 


E  { L }  =  E  { S  (T  -  (x  +  B,u  3.,v) )  “  Rv‘ 
,  .  o  i  2 

■’!  z  y/z 
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over  v,  where  S  and  R  are  scalar  weighting  the  regulation  of  x  and  the  disutility 
of  effort,  respectively.  The  parameter  T  represents  a  target  for  the  state  of 
division,  summarizing  the  division's  and  his  director's  best  interests  for  the 
future;  but  this  target  is  not  necessarily  consistent  with  the  overall  objective 
of  the  divisionalized  firm.  Furthermore,  this  target  is  set  by  the  division 
director,  and  is  not  accurately  known  by  the  headquarters  of  the  divisionalized 

firm;  but  we  assume  here  that  the  headquarters  have  an  a  prior  estimate  of  this 

*  * 

target,  denoted  by  T  .  An  alternative  interpretation  is  that  this  T  may 

represent  a  desired  target  by  the  headquarters  for  that  particular  division,  while 

the  division's  director  may  perceive  a  different  target  T,  whose  actual  value 

is  known  only  by  himslef,  and  he  performs  his  optimization  according  to  that 

value. 

Let  the  pair  (ut,vt)  denote  the  optimizing  arguments  of  the  objective 
function  of  the  leader,  which  is  different  from  (5.2)  —  this  difference  being 

mainly  due  to  a  discrepancy  in  the  perception  of  the  value  of  T  (the  leader 

*  4)  t  t 

perceives  it  as  T  ) .  The  aim  of  the  leader  is  to  induce  the  pair  (u  ,v  )  in 

this  decision  problem,  in  spite  of  the  uncertainty  he  is  faced  with  in  the 

value  of  T.  He  will  realize  this  goal  by  letting  his  decision  variable  u  to 

be  a  function  of  the  follower's  effort  level. 

This  problem  is  a  version  of  the  principal-agent  problem  [8] ,  where  in 

this  case  the  principal  can  observe  the  agent's  effort  level  but  is  uncertain 

about  his  true  objective.  For  a  numerical  illustration  of  this  problem,  we  will 

assume  some  specific  values  for  the  parameters  of  the  cost  functional.  More 

precisely,  let  L  be  (with  B^=B„=1,S=K=  1/2) 

i  2  12 

L  =  ( T  -  ( :<o  +  u  +  v ) )  +  ~  v  (3  . 1) 

"^We  should  note  that  this  discrepancy  is  not  the  only  factor  that  contributes 
co  the  difference  between  the  two  objective  functions,  but  is  the  most  pronounced 


one . 
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where  T  is  the  uncertain  parameter  to  the  leader,  with  a  nominal  value  T  =2, 
We  further  take  xq  to  be  nominally  distributed  with  mean  one  and  unit  variance, 
and  the  common  observation  to  be  given  by 


z  =  x  +  w  ,  w  ~ N(0,1) 
oil 


(5.4) 


In  addition  to  z,  the  leader  has  access  to 


y  =  xq  +  w2  ,  w2  -  N(0,1) 


(5.5) 


where  xq,  w^  and  are  mutually  independent.  Let  the  pair  (u  ,v  )  given  by 


t  _  z  ,  y  .  l 
u  "  2  +  3  +  3 

t  2.1 

v  =  2  +  3 


(5.6) 


(5.7) 


optimize  the  objective  functional  of  the  leader.  The  leader  seeks  to  induce 

c  * 

the  follower  to  choose  v=v  when  T=T  ,  and  to  make  sure  that  v  is  sufficiently 

t  * 

close,  and  if  possible  equal,  to  v  when  T  is  different  from  T  .  He  will 


realize  this  aim  using  a  strategy  of  the  form  (3.2).  As  a  counterpart  of  (3.6), 
Q(z,y)  is  defined  in  this  case  to  be  the  set  of  solutions  of 


E  {Q(z,y)(2y  +  4z  -3)}  =  (13z  -  2)12  . 

y/z 

As  in  (3.8),  a  family  of  such  solutions  is  characterized  by 


q ( 2  v)  =  _ (13z  -  2)^z_,yJ - 

^  ,y;  2  E  tg(z,y)(2y  +  4Z  -  3; 

y/ z 


where  g(z,y)  is  any  F  „.r -measurable  function  satisfying  the  condition 


It  is  always  possible  to  find  a  loss  functional  for  the  leader,  cuadr: 
v  and  :c,  for  which  (5.6)  and  (5.7)  provide  a  global  minimum. 


(5 .  S) 


(5.9) 
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E  {g(z,y)(2y  +  4z  -  3)}  i  0  .  (5.10) 

y/z 

t  *  * 

Any  such  Q(z,y)  will  induce  v=v  ,  when  T=T  .  However,  when  TfT  ,  Q  is  made  op¬ 
timum  by  choosing  g(z,y)  orthogonal  to  f^(z,y)  defined  by  (3.14);  equivalently, 

E  {g(z,y)[[13z  -2]  +  (2y  +  4z  -  3)]}  =  0  .  (5.11) 

y/ z 

A  possible  solution  to  this  is  given  by 

g(z,y)=  -3z2  +  6zy  +  4y  -  5z  +  10  (5.12) 

with  the  corresponding  Q(z,y)  being 

Q(z,y)  =  (-3z2  +  6zy  +  4y  -  5z  +  10)/12  .  (5.13) 

Now,  the  optimum  reaction  of  the  follower  to  an  affine  strategy  (3-2)  can  be 
computed  to  yield,  for  all  values  of  T: 

v^(z)  =  {T(l  -  E[Q/z])  +  E[Qxq/z]  -  E(xq/z]  +  E[utQ/z]  -  E[uC/z] 

t  t  2  {5’U) 

+  E[Q*7z]vC  -  E[Q/z]vC}/{E[cr/z]  -  2E[Q/z]  +  2} 

when  Q(z,y)  is  chosen  as  in  (5.13),  we  have 

E[Q(z,y)/z]  =  1  ,  Vz€E  (5.15) 

so  that  the  uncertain  term  T  drops  out  from  the  optimal  reaction  of  the  follower. 
Since  (5.13)  satisfies  (5.8),  and  the  optimum  reaction  is  independant  from  T, 
the  first-best  solution  (uC,vC)  is  induced  for  all  values  of  the  uncertain 
parameter  T.  To  illustrate  the  impact  of  the  additional  information-induced  op¬ 
timum  strategy  (5.13),  the  optimum  reaction  of  the  follower  is  plotted  against 
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T  for  three  different  values  of  2  (2  =  -0.5,  0.2  and  4.0).  In  Figs.  1-3  the 

solid  lines  represent  the  follower's  optimum  reaction  when  g  is  an  arbitrary 

F  -measurable  function.  Note  that,  for  any  F  -measurable  g,  we  have  from  (5.8) 
z  z 

Q  =  (13z  -  2)/2(52  -  2)  .  (5.16) 

On  the  other  hand,  the  dashed  lines  represent  the  follower's  optimum 

*  * 

reaction  to  the  optimum  strategy  (5.13)  when  T  varies  about  T  =2.  At  T=T  =2, 

t  * 

we  have  v=v  under  both  strategies.  However,  when  T  varies  about  T  ,  the 

optimum  reaction  induced  by  the  Fz~measurable  Q  departs  from  the  first-best 

solution  linearly,  while  under  the  optimum  strategy  it  does  not  depart  from 

vC  for  any  value  of  T. 

The  optimum  policy  (5.13)  has  been  able  to  track  to  vC  independent  of 

T,  mainly  because  of  the  linearity  of  the  optimum  reaction  (5.14)  with  respect 

to  T.  If  the  uncertain  parameter  were  B, ,  which  is  the  technology  of  transforming 

resources  into  production,  then  since  the  follower's  optimum  reaction  is  net 

linear  in  B^,  the  outcome  of  the  corresponding  near-optimum  strategy  would  be 
t  * 

very  close  to  v  in  a  certain  neighborhood  of  3^,  but  considerable  departures 

•k 

would  be  observed  when  B^  is  too  far  away  from  the  actual  value.  We  refer  to 
[9]  for  an  illustrative  example  of  this  kind  where  the  uncertain  parameter  has 
been  taken  to  be  B^. 
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response  (5.14)  ill  the  follower  to  the  leader's  optimal 
).li)  (dashed  line)  and  F^-measurub I e  policy  (5.I6) 
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VI.  CONCLUDING  REMARKS 

In  this  paper  we  have  obtained,  in  the  context  of  multi-person 
decision  making,  coordinator  (leader)  strategies  which  render  the  leader's 
performance  index  insensitive  or  minimally  sensitive  to  variations  in  the 
parameters  of  the  follower's  objective  functional,  under  some  linear 
independence  conditions.  This  appealing  property  is  intrinsic  to  stochastic 
decision  problems,  and  has  no  counterpart  in  deterministic  incentive  problems 
of  the  type,  say,  discussed  in  [3].  We  have  achieved  this  by  basically  exploiting 
the  redundancy  present  in  the  leader's  dynamic  information,  his  private  information 
and  the  ensuing  fact  that  the  leader's  decision  variable  is  not  measurable  with 
respect  to  the  follower's  information  field. 

A  possible  extension  of  the  general  approach  of  this  paper  would  be 
to  continuous-time  decision  problems  with  open-loop  information,  formulated 
in  a  Hilbert-space  setting,  along  the  lines  of  [10].  Yet  another  extension 
would  be  to  a  multistage  decision  problem  wherein  the  leader  uses  closed-loop 
information  to  compute  his  robust  strategy.  Derivation  of  optimum  or  near¬ 
optimum  strategies  in  these  contexts  is  currently  under  study. 
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APPENDIX  A 


dv11 

In  this  appendix,  we  will  derive  an  expression  for  -  evaluated 

,  da 

t  t  * 

at  u=u  ,  v=v  and  a=a  .  We  first  differentiate  both  sides  of  (3.11)  with 


respect  to  a  to  obtain 

‘  ,e  {l„v>  +  <ir>  ,E  auv*}i  -  q2[(tt)  ,e  !l»„> 

y/z  da  x/y,z  x/y,z  da  x/y,z 


,  t  2  t 

+  Fp-)  E  {L  }]  +  Q  E  {L  } }  =  E  { (~— x~)L  +  Fp-)L 

da  ,  uua  x  ,  uaa  .  ,2  vv 

x/y,z  x/y,z  x,y/z  da 


da  vva 


( A.  1 ) 


-  Q[(^“V)L  +  ~~  L  ]  +  L  }  . 

,2  vu  da  vua  vaa 
da 

d  t  2  t 

Since  we  seek  a  Q(z,y)  orthogonal  to'both  — r—  and  — ,  we  can  consider 

dv1"  da“  d~vC 

already  to  be  annihilated  in  (A.l),  and  pull  out  the  F^-measurable  - j"  from 

daz 

the  conditional  expectation.  Using  this  and  the  smoothing  property  of  conditional 
expectation,  we  readily  obtain 


d2  C  o 

— ~  =  E  {L  0  -  L  }/  E  {L  Q“  -  2L  Q  +  L  ] 
,  2  ,  uaa  '  vaa  ,  uu  uv  w 

da  x,y/z  x,y/z 


(A. 2) 


Then,  an  induction  type  of  argument  yields 


,n  t 
Q  v 

,  n 
da 


E  lLuanQ  ~  L — n ' ^  E 


:,y/z 


va“ 


x,v/z 


a  o‘ 

uu  " 


2L  0  +  L  } 
UV  •  w 


(A.  3) 
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ABSTRACT 


In  this  paper,  we  consider  a  two-agent  stochastic  team  decision  problem 
with  a  hierarchical  decision  structure  in  a  general  Hilbert  space  setting.  One  of 
the  agents  has  a  different  perception  of  the  common  team  objective  functional, 
as  quantified  in  terms  of  a  finite  dimensional  parameter  vector.  The  other 
hierarchically  superior  agent,  uninformed  about  this  discrepancy,  but  endowed 
with  a  suitable  information  structure,  designs  a  near-optimal  incentive  policy 
such  that  the  incurred  value  of  the  original  team  functional  is  arbitrarily  close 
to  its  global  optimum,  in  spite  of  the  existing  discrepancy.  The  general  solution 
is  determined  by  some  orthogonality  relations  in  some  appropriately  constructed 
probability  measure  spaces,  and  leads  to  particularly  simple  incentive  policies. 
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I .  INTRODUCTION 

In  this  paper,  we  consider  a  general  class  of  two-agent  stochastic 
dynamic  decision  problems  wherein  both  of  the  agents  jointly  optimize  a  given 
objective  functional,  which  gives  rise  to  a  so-called  team  decision  problem  [1]. 
Within  the  spirit  of  [2],  we  will  relax  the  basic  assumption  of  a  team  decision 
problem  that  the  agents  perceive  the  common  goal  in  exactly  the  same  way,  by 
allowing  one  of  the  agents  to  have  a  somewhat  different  perception  of  the  common 
objective^and  quantifing  this  discrepancy  in  terms  of  an  objective  functional 
which  differs  from  the  original  objective  functional  up  to  a  finite-dimensional 
parameter  vector  a.  We  further  assume  that  the  other  agent  is  not  informed  of 
this  discrepancy,  but  is  able  to  monitor  the  decision  of  the  former  by  assuming 
a  hierarchically  superior  position  in  the  decision  process.  The  problem  we 
address  in  the  sequel  is  derivation  of  near-optimal  decision  policies  for  the 
hierarchically  superior  agent  such  that  the  variation  in  the  decision  value  of 
the  agent  who  is  faced  with  a  discrepancy  in  the  common  objective  functional  is 
kept  to  a  certain  minimum. 

We  will  approach  this  problem  using  the  notion  of  near  optimum 
stochastic  incentive  policies  [3J.  Here  the  hierarchically  superior  agent 
(decision  maker),  henceforth  referred  to  as  DM1,  incarnated  with  a  suitable 
information  structure  to  be  delineated  in  the  sequel,  induces  the  other  agent, 
henceforth  referred  to  as  DM2,  to  behave  in  a  desired  manner,  while  reducing 
the  effects  of  the  discrepancies  to  an  arbitrarily  small  value  under  certain 
conditions.  The  next  section  presents  a  precise  mathematical  formulation  for  the 
problem,  while  Section  III  presents  the  main  results.  Proof  of  the  main  theorem 
has  only  been  outlined  in  this  paper,  but  a  fuller  version  will  be  provided  at  tin 
Conference  in  Las  Vegas. 
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II.  PROBLEM  FORMULATION 


Let  (ft.FjP)  be  an  underlying  probability  space,  on  which  three 
correlated  random  variables  x,  y  and  z  are  defined.  Here,  lRn  denotes  the 
random  state  of  the  nature,  and  z£3Rmis  the  common  (to  both  agents)  and 
y€]Rp  private  (to  DM1)  information  related  to  x.  Let  U  and  V  be  given  real 
separable  Hilbert  spaces  denoting  the  decision  spaces  of  DM1  and  DM2,  respectively , 
and  let  and  r^z  denote  the  corresponding  policy  spaces, characterized 
for  each  fixed  z^Mmby 


r2z  =  (measurable  >2  :  such  that  <Y2 (z)  >^2 (2) >v  < 

^lz  =  (measurable  :  Vx/x  such  that  E  (<y^[y9(z)  ,z,y] , 

y/z" 

y1(Y2(z)  ,2,y]>u  <  ”  ,  VY2€r2z}} 


(2.1) 


(2.2) 


g 

We  also  let  r^CT^  indicate  the  set  of  all  "static"  policies  for  DM1,  defined 
by 

=  {measurable  y.  :  lRm  x  TRP-*U,  such  that  E  <y.  (z ,y)  ,y,  (z ,y) >  <  00  } .  (2.3) 

1Z  X  /I  1  u 

y/ z 

These  transformations  are  restricted  by  the  (implicit)  condition  that  the 

expectations  are  well-defined.  Here,  <*,*>u  and  <•,•>  denote  the  inner  products 

associated  with  U  and  V,  respectively,  and  the  Hilbert  sDaces  F,  and  F_  have 

lz  1  z 

their  own  derived  inner  products. 

We  now  introduce  the  objective  functional  L  :  IRn  x  U  x  V  x  A R  which 
describes  the  team  decision  problem,  and  where  /A  is  a  subset  of  1R,  on  which  a 
parameter  ~i  takes  values.  Here,  the  parameter  a  characterizes  the  above  mentioned 
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discrepancy.  For  a  fixed  z  £  ]Rm  and  for  all  aS/A,  L  belongs  to  L2[.T>P(z,y/z)  ] , 
the  Hilbert  space  of  square-integrable  random  variables  under  the  conditional 
probability  measure  P(x,y/z).  We  further  let  the  team  objective  functional  be 

J  :  ^  x  r2  x/A-»E 

where 

d(2,Y1,Y2,a)  =  E  {L(x,u,v,a)  (u  =  Y-^Cv.z.y)  ,  v  =  y2(z)  } ,  Va€/A  ,  z<=  IR™  .  (2.4) 

x,y/z 

We  assume  that  L(x,u,v,a)  is  strictly  convex  on  U  x  V  for  each  x^  IRm  and 
a£/A,  and  is  continuous  in  all  of  its  arguments.  Now,  for  a  fixed  a  S/a,  say 

*  in 

a  ,  and  fixed  z €  ]R  (the  realized  value  of  this  random  variable),  let 

{uC  =  Y^ (z,y) ,vC  =  y,(z)}  denote  a  unique  pair  in  F®  x  T.  chat  globally 
z » y  i  z  /  J.  z  cZ 

ie  ^ 

minimizes  J(z,Y,,Y0*a  ),  where  u  is  souare  intearable  under 
l  4  z  ,y 

the  conditional  probability  measure  P(y/z)  derived  from  P(x,v/z) , wi th  uc  ^  and 

z  ,y 

v^  are  continuous  in  z,y  and  z,  respectively.  With  this  pair,  one  obtains  the 

expected  minimum  value  of  the  team  decision  problem,  conditioned  on  the  common 

*  t 

observation  z,  at  a-a  ,  which  we  denote  by  J  .  We  assume  that  the  hierarchical ] v 

cc*,  z 

* 

superior  DM1  perceives  this  team  decision  problem  at  a=a  ,  while  DM2  has  a 

4* 

different  perception  of  a,  say  a',  and  this  discrepancy  gives  rise  to  a  different 

4- 

objective  functionalfor  DM2,  J(z ,y^ >Y2 >a  ) >  with  this  discrepancy,  the  decision 
problem  ceases  to  be  a  team  one. 

By  virtue  of  the  information  structure  of  the  problem,  and  of  his 
hierarchically  superior  position,  DM1  is  allowed  to  announce  his  policy 
Yi(Y2(z),z,y)  in  advance,  and  to  implement  it.  Then,  DM2  computes  his  optimum 

y  d.  ~ 

reaction  v  i(a  ) ,  for  each  announced  Y, ,  given  z  and  his  perception  i  . 
z  l 


•f*  * 

Since  a  is  different  from  a  ,  the  parameter  value  of  the  team  decision  problem 

that  DM1  sticks  to,  and  since  DM1  does  not 

x  yi  +  t 

know  what  DM2's  perception  a  is,  we  generally  have  v  1(a  )  ^  v  , 

Z  2 

V  r  The  goal  adopted  in  this  paper  is  to  find  a  near-optimal 

such  that  vY^(a+)  is  arbitrarily  close,  or  equal  to  vt, (depending  on  the  structure 

25  Z 

f  * 

of  L)  when  a  is  within  a  certain  neighborhood  of  a  .  By  continuity  of  with 

5  t 

respect  to  v,  the  resulting  u  S  r.  will  be  arbitrarily  close  to  u  ,  and 

z ,y  iz  z  *  y 

as  a  consequence  of  this  near-optimal  policy,  the  incurred  team  cost  will  be 

arbitrarily  close  to  JC*  ,  in  spite  of  the  existing  discrepancy  regarding  the 

a  ,  z 

team  objective  functional. 
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III.  MAIN  RESULTS 

t  * 

In  this  section,  we  will  assume  that  a  is  in  a  neighborhood  of  a  , 

y-«  f  it 

such  that  the  Taylor  series  of  v  x(a  )  about  a  converges  for  a  given  y,  to  be 

Z  1 

made  precise  in  the  sequel.  We  also  assume  that  L(x,u,v,a)  is  Frechet  analytic 
[4]  in  u,  v,  and  analytic  in  a,  for  each  ot£/A,  where /A  is  identified  as  the 
above  neighborhood.  The  random  variables  x,  z  and  y  are  jointly  second-order 
random  variables  and  L  is  measurable  with  respect  to  (w.r.t.)  the  sigma-field 
generated  by  x,  so  that 

E  {L(y1(v^{a)  ,z,v)  ,  v^(a)  ,z  ,a)  j  z }  (3.1) 


is  well-defined  for  all  y^  6  r^v^^a)  ~  ^2^^2z  and  a €/A>  2  e  • 

Towards  the  goal  set  in  the  previous  section,  we  will  now  let  DM1 
adopt  an  affine  (in  v)  policy  [5-7],  given  by^ 


Yl(v’z’y)  *  u">y  -  QZjy(v  -  v*) 


(3.2) 


where  for  each  fi  ed  z  and  y,Q  (.)€jC(V,U)  is  a  bounded  linear  operator  mapping 

2  »y 

V  into  U,  and  for  each  fixed  y.(z)6r.  ,  the  resulting  uC  -  Q  (v  -  vC)  is  an 

^  zz  z  j  y  z  j  y  z 

element  of  with  v  =  Y2(z). 


If  the  underlying  decision  problem  is  a  Stackelberg  game,  a  necessary  condition 
for  the  existence  of  an  affine  policy  inducing  (uc,vt)  is  that  VUL  should  be 
nonvanishing  at  the  nominal  point  [5,3].  In  our  problem,  this  condition  is  not 
met.  However,  since  (uc,vc)  is  naturally  induced  at  the  nominal  value  of  t,  the 
condition  that  7UL  be  nonvansihing  is  not  required,  and  in  fact  is  not  met  in  a 
team  decision  problem  with  a  hierarchical  structure. 
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We  note  in  passing,  that  this  enjoys  the  desirable  property  of 

y1 (v,z ,y)  =  u[  when  a  =a  (3.3) 

1  Z  y  y 

so  that  the  only  restrictions  on  Q  (v)  are  that  it  should  be  an  element  of  £(V,U) 

z  >  y 

for  each  fixed  z  and  y  and  Vv€v,  and  it  should  be  square  integrable  under  P(y/z)  for  a 
fixed  v€V,  for  fixed  z6]Rm.  Equation  (3.3)  follows  from  the  fact  that 

v^<°0  =  v*  when  a+=a  ,  ¥Q  (.)€jC(V,U)  .  (3. A) 

z  z  z  ,y 

We  now  let 

+  * 

a  =a  +  h^  ,  h^G®.  such  that  a  6/A  .  (3.5) 

Then,  (3.4)  no  longer  holds,  and  given  the  common  observation  z,  DM2  faces 
the  optimization  problem  of  minimizing 

E  {L(x,uC  -  Q  (v  -  vC),v,x,a  )}  (3.6) 

x,y/z  2’y 

over  vGV.  This  optimization  problem,  also  called  the  optimum  reaction  of 
DM2,  can  be  characterized  by 

D  {  E  {L(x,uC  -  Q  (v  -  vC) ,v,x,a+) }}  (h  )  =  0  ,  Vh  €  V  (3.7) 

x,y/z  2,y  . 

where  Dv(-)  is  the  Frechet  differential  operator,  Dv( • ) G£ (V,]R)  =  v*,  where  V* 
is  the  topological  dual  of  V  [8].  Since  L  is  Frechet  analytic  and  is  an  element 
of  L2(Ll,P(y/z) ) ,  it  is  not  hard  to  show  that  the  expectations  and  differentiations 
are  interchangeable,  so  that  (3.7)  rewrites  as 

E  (D  (L)  -  Q*  { D  (L) } }  (h  )  =0  ,  Vh  6  V 

x,y/z  2,7  V 


(3.8) 
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Where  D^(0  is  Che  Frechet  differential  operator  w.r.t.  u£U,  with 

D  (•)  ££  (U,R)  s  U*,  and  Q  (•)  is  the  adjoint  of  Q  (•)>  with  Q  (*)e^(L!  ,V  ), 
u  z  ^  y  2,y  z  >  y 

for  each  fixed  z,y. 

For  the  characterization  of  a  near-optimum  policy,  we  will  need  the 

4* 

Taylor  expansion  of  v  (a  )  =  v  (a  +  h  )  about  a  .  Towards  this  goal,  we  note 
that  (3.8),  the  necessary  condition  for  optimality,  is  in  fact  an  identity  for 
all  a€/A.  Therefore,  when  we  take  the  (ordinary)  derivative  of  (3.8)  with  respect 
to  a  it  will  still  be  equal  to  zero.  We  then  have 

r\  ^  ^  rt  a  ^ 

E  (d  D  L  +  D  (L)d  v  -  Q  (D  D  L)d  v  +  (Q  )  (D  L)d  v  -  Q  (D  D  L)d  v 
.  a  v  v  a  x  u  v  a  u  a  vu  a 

x,v/z 

(3.9) 


-  Q*(daDuL)}  (hv,hA)  =  0  ¥hvev  ,  vhAe/A 

where  D2(  • )  e  L  (v  x  V  ;  ®)  ,  D2 S  £  (u  x  U  ;  3R)  ,  (Q*)2G-C(U*  x  U*  ;  V*)  and  dj 

represents  the  operator  which  takes  the  ordinary  derivative  with  respect  to  the 
scalar  a.  Rearranging  (3.9),  we  obtain 


E  {(D2L  -  Q*(D  D  L)  -  Q*(D  D  L)  +  (Q*)2(D2L)]d  v  +  d  D  L 
,  v  ^  u  v  vu  u  a  a  v 

x,y/z 


(3.10) 


-  q*(d  DL)]  (h  ,h . )  =  0  ,  ¥h  ev  ,  Vh  ,e/A  . 

x  a  u  v  A  ’  v  A 

2  *  *  ic  2  2 

We  now  observe  that  (Dy  -  q  ~  Q  +  (Q  )  D  )(•)  is  a  strictly  positive 

operator,  due  to  the  strict  convexity  of  L  on  U  x  V,  and  therefore,  if  we  can 

* 

find  a  q  €£(V,U)  such  that  its  adjoint  satisfies  at  a=a 
z,y 

E  {d  D  L  -  q*(d  D  L)}  (h  ,h.)  =  0  Vh  €  V  ,  Vh  €  /A  ,  (3.11) 

.  a  v  a  u  vA  v  A 

x,y/z 


i 
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we  will  necessarily  have 


dovj=  0 


(3.12) 


Now,  the  Tavlor  series  expansion  of  v  (a  )  about  a  is 

2 

v  (a  )  =  v  (a  )  +  Z  —  dn(v  (a  ))hn  ,  Vh  such  that  a  +  h  ■  ct  £/A.  (3.13) 

2  2  ,  ni  a  z  a  a  a 

n=l  “■ 

^  t 

If  we  can  realize  (3.12),  v  (a  )  will  match  v  (a  )  =  v  up  to  first  order, 

2  2  2 

¥a  €/A;  and  the  existence  of  such  an  expansion  is  guaranteed  by  the  Frechet 
analvticitv  of  L,  as  it  will  be  clear  in  the  sequel.  Towards  the  realisation  of 
(3.13),  we  recall  that  Q  (•)  is  a  linear  bounded  and  square  integrable  operator 

z  ,y 

* 

under  P(y/z),  mapping  V  into  U.  Likewise,  its  adjoint  Q  is  a  linear  operator 

*  * 

with  the  above  properties,  mapping  U  into  V  .  One  such 

Q*  (•)  €£(U* ,V*)  is 

z  j  y 


Q  (•)  =  <%  »•>  *2.  /  E  (<g  ,f  >  *} 

z,y  z,y  u  z  /  z,y  z,y  u* 

x,y/z 


(3.14) 


*  *  * 

where  g  S  u  ,  l  €  v  and  f  €  u  satisfying 
,v  z  2,y 


E  ..  >  f  >*>  *  o 


x,y/z 


z.y  z,y  u 


(3.15) 


and  g  ,  f  and  a  are  nonvanishing  and  bounded  functionals,  which  are  also 
z » y  z  ,  y  z 

weakly  continuous  in  z,  y  and  z,  respectively.  Furthermore,  for  each  square  inte¬ 
grable  u*£U  ,  and  fixed  z€  ]Rm,  <g  ,u  >  *  and  <f  ,u  >  *  are  elements  of 

z,y  u  z,y  u 

L2(fi,P(y/z)) .  When  we  plug  (3.14)  into  the  right-hand  side  of  (3.11),  the  goal 
will  be  accomplished  if  we  can  equate  the  following  quantity  to  zero'. 
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E  {d  D  L  -  <g  ,d  D  L\  *2  }/  E  {<g  ,f  >  *}} 
x,y/z  aV  2’y  auA  x,y/z  2’y  2’yU 


=  E  d  D  L  -  <g  ,d  D  L>  *£  >/[ <g  ,f  >  *]} 
,  a  v  z.y  a  u  u  z  z,y  z,v  u* 

x,y/z  J  J 


(3.16) 


=  E  {d  D  L  <g„  „,f  >,*  -  *  <g,  „,d  D  L>  * } /  E  {<g„  ,f 


x,y/z 


a  v  z,y  z.y  u  z  z,y  u  u 


x,y/z 


z,y  z,y  u 


E  { [d  D  L  -  i  ]  E  {<g  ,f  -  d  D  L>  *}}/  E  {<g  ,f  >  *} 
x/z  aV  2  x,y/z  2’y  2’y  au  U  x.y/z  2’y  2’yU 


=  0 


For  this,  it  is  sufficient  to  choose  a  nonvanishing  g  such  that 


z,y 


E  { <g  ,f  -  d  D  L>  *}  =  0 
,  z,y  z,y  a  u  u 
x,y/z 


(3.17) 


To  realize  (3.17),  we  pick  two  bounded  nonvanishing  elements  f^  and  f^  of  D  , 
weakly  continuous  in  z  and  y,  which  are  also  square  integrable  under  P(y/z), 
and  let 


g  *  <fn,f  -  d  D  L>  *f,  -  E  {<f.,f  -  d  D  L>  *}f_ 

z ,y  1  z,y  a  u  u  2  x>y(/z  1  z  ,y  a  u  u*  2 


(3.18) 


It  is  easy  to  verify  that  a  g  characterized  by  (3.18)  satisfies  (3.17)  without 

z.y 

violating  (3.15)  if 


We  will 
to  a  well-defined 


daDuL  *  0  (3.19) 

sir 

now  show  that  Q  (•)  characterized  by  (3.14)  and  (3.18)  leads 

z  »y 

policy.  In  these  equations  we  had  complete  freedom  over  the 


I 

K 


choice  of  f  ,  i  ,  f,  and  f».  The  only  term  to  study  is  d  D  L.  Since  L  is 

z,y  z  1  2  J  7  au 

Frechet  analytic,  d  D  L  is  a  bounded  linear  functional  for  each  fixed  z  and  v. 
a  u 

The  following  lemma  is  useful  in  establishing  the  fact  that  it  is  also  square 
integrable  under  P(y/z) . 

Lemma  1 :  The  n'th  order  Frechet  derivative  of  L  w.r.t.  u  is  square-integrable 
under  P(y/z) ,  for  n=l,2, . . • ,N < <*>. 

Proof :  Since  L  is  Frechet  analytic  in  U  in  a  domain  D,  there  is  an  open  sphere 


S  of  radius  r  on 


which  L  is  locally  bounded,  say  by  M(u) .  For  each 


h €  U  we  have  [4,p.  161] 


|BjL(a)(h)|  < 


Now,  since  M(u) ,  being  a  local  bound  on  L,  is  (square)  integrable 

under  P(y/z),  by  a  Corollary  to  the  Dominated  Convergence  Theorem,  [10, p.  126], 

d\,  is  also  square  integrable  under  P(y/z).  0 

t  t  * 

In  the  above  discussion,  Lwas  evaluated  at  u  ,  v  and  a  ,  which 

z  ,y  z 

makes  it  a  function  of  z  and  y;  in  fact,  it  is  continuous  in  these  random 
variables  under  the  hypothesis  cited  while  formulating  the  problem.  The  next 
step  is  to  prove  that  the  proposed  incentive  policy  given  by  (3.2),  (3.14)  and 

k 

(3.18)  is  a  well-defined  one,  which  is  guaranteed  if  Q  is  measurable  in  z  and  y. 

z  *  y 

The  following  lemma  is  useful  in  establishing  an  even  stronger  regularity 

k 

property  for  Q  . 

Lemma  2 :  The  n'th  order  Frechet  derivative  of  L  w.r.t.  u  is  weakly  continuous 
in  z  and  y,  for  n=l,2,  —  ,N  < 


I 
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Proof :  We  will  first  consider  the  dependence  on  2.  Since  L  (u)  is  a  continuous 

function  of  z,  for  a  given  u  (u^  in  this  case),  and  given  z  (e.g.  the 

2  ,y  o 

realized  observation),  for  each  e>0,  there  exists  a  6>0  such  that 


2  -  2  I  <  6  =» 


M  (u)  -  H  °(u)[  <  e 


where  M  °(u)  is  the  local  bound  on  L.  Using  the  upper  bound  given  in  the  proof 

n  &  n 

of  Lemma  1  on  the  Frechet  derivative  D^LZ(u)(h),  there  exists  a  6'  =  — — — ^ 


such  that  for  each  £  >  0,  we  have 


n!||h« 


Jz  -  2J<  6 '  -  ||DV(u)(h)  -  D\Z°(u)(h)||<  £  ,  Vheu 

The  similar  proof  applies  to  the  y-dependence  of  D^L(u) (h) . 

We  now  summarize  the  preceding  discussion  below. 

Proposition  1:  Let  (3.19)  be  satisfied.  Then,  the  optimum  reaction  of  DM2, 

*f  £ 

vz(a  ),  can  be  matched  to  the  team-optimum  decision  vz  up  to  first  order  in 

the  Taylor  expansion  of  vz(a  ),  Va  e/A,  using  an  affine  incentive  policy 

(3.2),  where  the  adjoint  of  Q  (•)  satisfies  (3.14),  with  g  given  by 

z  >y  z,y 

(3.18). 

Proof :  A  policy  with  the  above  properties  has  been  obtained  by  construction 
prior  the  statement  of  the  proposition.  The  fact  that  this  policy  is  square 


z.y 


integrable  under  P(y/z)  is  guaranteed  by  Lemma  1.  Lemma  2  implies  that  Q 

is  weakly  continuous  in  z  and  y,  a  stronger  property  than  measurabilitv . 

Hence,  this  policy  is  an  element  of  T,  . 

lz 

We  now  focus  attention  to  the  next  term  in  the  Taylor  expansion  of 
-  *  2 

vz(a  )  about  a  ,  d^v^(n) .  Taking  the  second  derivative  of  (3.8)  w.r.t.  a, 
and  also  by  considering  the  fact  that  d^v  can  be  made  equal  to  zero  at  r=i*. 


we  obtain  (at  a=ct  ) 


£ 

x.y/z 


Now, 


UdJl  -  Q*[DuDvLJ  -  Q*[DvDuL]  +  (Q*)2[D2L]]d2v  +  d2DyL 

Q*(daDuL)}  (hv,h‘A,hA)  =  0  ’  ’  Vhi>hle/A  * 

* 

if  Q  (•)  satisfies 

z»y 


E  {d2D  L  -  Q*(d2D  L) }  (h  ,h*h2)  =0  ,  Vh  S  V 

.  a  v  a  u  v  A  A  v 

x,y/z 


(3.20) 


(3.21) 


and  simultaneously  (3.21),  both  at  ct=a  ,  then  the  optimum  reaction  of  DM2  will 

match  the  team-optimum  decision  up  to  the  second  order  in  the  Taylor 

expansion  of  v  (a).  Adopting  the  representation  (3.14)  for  Q*  (•),  as  a 
z  z,y 

counterpart  of  (3.17)  we  obtain 


c»y/z 


z,y  z,y 


(3.22) 


which,  together  with  (3.17),  yields  a  characterization  for  g(z,y)  annihilating 

the  first  and  second  order  terms  in  the  Taylor  expansion  of  vz(n  ).  To  show 

that  (3.17)  and  (3.22)  can  be  simultaneously  satisfied,  we  define  a  new  inner 
* 

product  on  U  , 


,  •  >  =  E  {<•  ,  ->u*J 
x,y/z 


(3.23) 


2  * 

Now,  let  S  be  a  subspace  of  V  generated  by  the  pair  (f  -  d  D  L. 

z  ,  y  a  u 

2  2 
f  -  d  D  L) .  One  can  construct  a  vector  e  orthogonal  to  S  under 
z,y  a  u  z,y  ’  ’ 

by  Gram-Schmidt  orthogonalization  procedure,  and  equate  this  e^  ^  to 


z,y 


,  (3.17)  and  (3.22)  will  be  simultaneously  satisfied.  As  a 


With  this  g 


counterpart  of  (3.19),  we  also  need 


d  D  L  i<  0 
a  u 


(3.24) 


This  discussion  then  leads  to  the  following  proposition  which  will  be 
proven  rigorously  in  the  final  version  of  the  paper,  to  be  presented  at  CDC  84. 
Proposition  2:  Let  (3.19)  and  (3.24)  be  satisfied.  Then,  the  optimum  reaction 

*f*  £ 

of  DM2,  v  (a  ) ,  can  be  matched  to  the  team-optimum  decision  v^  up  to  second 

-I.  4. 

order  in  the  Taylor  expansion  of  v  (a  ' ) ,  Va  S/A,  using  an  affine  policy  (3.2), 

where  the  adjoint  of  Q  (•)  satisfies  (3.14),  with  g  being  orthogonal  to 

z,y  z,y 

2 

the  subspace  S  .  D 

Now,  it  is  clear  that  this  procedure  can  be  extended  to  annihilate 
dnv,  n=l,...,N  where  N  is  an  arbitrarily  large  positive  integer,  provided  that 

Cl 

we  have 


dnD  L  *  0  ,  n=l,...,N  .  (3.25) 

a  u  ’ 

To  realize  the  above  assertion,  it  is  sufficient  to  choose  a  g  orthogonal 

2 » y 

N  n 

to  the  subspace  S  generated  bv  {f  -  d  D  L)  ,  n=l,...,N. 

'  z,y  a  u 

Let /A  be  a  bounded  interval  of  the  real  line,  and  let /A  denote  a 
closed  set  contained  in  /A.  Then,  by  Weirstrass  Theorem  [9],  the  Taylor  expansion 

■f*  jV 

of  v^(a  )  around  a  is  a  uniform  approximation  of  ).  On  the  other  hand, 

0. 

under  (3.25),  we  can  annihilate  the  term  in  the  Taylor  expansion  of  v  (ot  ’ )  up 
to  N'th  order.  This  then  leads  to  the  following  theorem. 

Theorem  1:  Let  (3.25)  be  satisfied,  and  the  value  of  m  £/A  be  uncertain  to 
DM1.  Then,  there  exists  an  affine  incentive  policv  v°€  ?  for  DM1  such  that 

y  O  1  12 

'1  ~ 

the  optimum  response  of  DM2  to  this  policy,  v^  (x  ) ,  satisfies 


Jk 
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y° 

II  v  1  (a  )  -  vC]|  <  e 
II  z  zll  v 

where  s  is  an  arbitrarily  small  positive  number.  Such  a  policy  is  represented 

by 


Y°(v,y ,z)  =  u^>y  -  Q°<y(v2  -  v*) 


where  Q  (•)e^(V,U),  and  its  adjoint  satisfies  (3.14),  with  g  being 
z,y  z,y 

orthogonal  to  the  subspace  .  This  y°  is  called  a  near-optimal  policy  for  DM1. 


Of  course,  if  v^Ca  )  has  an  exact  representation  in  terms  of  a  finite 
number  of  powers  in  h^,  then  this  near-optimal  policy  becomes  an  optimal  one. 


Remark. :  The  private  information  v  guarantees  the  existence  of  a  nonzero  g 

z  >y 

N  n 

orthogonal  to  the  subspace  S  generated  by  y  -  d^D^L } ,  under  the  inner  product 

>  defined  in  (3.23)  provided  that  (3.25)  holds.  If  the  underlying  decision 

space  U  is  infinite  dimensional,  without  the  private  information  y,  one  can  still 

N 

find  a  e  orthogonal  to  S  under  the  inner  nrnrtuct  <•  ,  •>  ..  In  this  case, 
z  u* 

"k 

the  linear  operator  Qz(0  is  characterized  by 


* 

Qz(0 


<g  ,  •  >  . 

6z  u* 


l  /<g  >  * 

z  z  z  u* 


(3.26) 


However,  when  U  is  finite  dimensional,  say  of  dimension  M,  with  inner 

product  defined  by  the  scalar  multiplication  of  two  vectors,  no  nonzero  g  €  u* 

can  be  orthogonal  to  more  than  >1-1  linearly  independent  vectors,  so  that  Theorem  1 

would  not  hold, true  in  general.  The  private  information  y  induces  the  inner 

produce  *<•  ,  •  > ,  under  which  it  is  possible  to  find  a  g  orthogonal  to  S'  , 

z » y 

regardless  of  the  dimensionality  of  U,  subject  to  some  regularity  conditions  on  L. 

n 
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IV.  EXTENSIONS 

The  final  version  of  the  paper,  to  be  presented  at  CDC  84,  will 
include  rigorous  proofs  of  Proposition  2  and  Theorem  1  given  above,  and  will 
also  discuss  the  case  where  a€/A  is  a  vector.  It  will  be  shown  that  the 
results  presented  in  this  paper  remain  basically  intact  when  a  is  a  finite¬ 
dimensional  vector. 

The  final  version  of  the  paper  will  also  include  an  infinite-horizon 
LQG  team  decision  problem  with  discounted  cost  functional,  with  an  existing  dis¬ 
crepancy  between  the  agents  about  the  discount  factor.  Some  numerical 
studies  will  supplement  the  theoretical  results. 
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ABSTRACT 

An  equilibrium  theory  is  developed  {or  multi-person  multi-criteria 
stochastic  decision  problems  wherein  the  decision  makers  have  different  subjective 
probability  measures  on  the  uncertain  quantities.  Particular  attention  is  devoted 
to  existence  and  uniqueness  of  stable  equilibria  in  such  problems,  when  the  loss 
functionals  are  (locally)  quadratic  and  the  subjective  probability  measures  are 
Gaussian. 


INTRODUCTION  AND  PROBLEM  FORMULATION 

Consider  the  class  of  two-person  two-criteria  stochastic  decision  problems 
with  loss  functionals  L^(x,  Uj,  u^)  and  L^ (x,  u^,  u^)  for  DM1  (first  decision  maker) 
and  DM2,  respectively,  where  u^,  u^  denote  the  decision  variables  [of  Mil  and  DM2, 
respectively]  belonging  to  some  prescribed  Hilbert  spaces  U^  and  U2>  and  x€X  stands 
for  the  state  of  Nature.  Let  y^YjSnd  YjEYj  be  two  stochastic  variables,  which 
are  correlated  with  x  and  denote  the  measurements  available  to  DM1  and  DM2,  respec¬ 
tively,  so  that  o^  will  be  chosen  as  a  measurable  function  of  y^  i  -  1,2,  i.e. 

»i  “  where  y ^  belongs  to  a  policy  space  IV  which  will  be  delineated  in  the 

sequel.  The  sets  X,  Y^  and  Y.,  are  assumed  to  be  structured  appropriately,  so  that 
each  is  a  well-defined  Hilbert  space. 

So  far  we  have  adopted  the  standard  decision-theoretic  framework  (see 
e.g.  Ferguson  (1967))jwe  depart,  however,  from  this  standard  fornulation  in  the 
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description  of  the  underlying  probability  space.  Let  (0,3)  be  a  measurable  space 
to  which  the  triple  (x,  y^,  y2>  belongs;  then,  we  assume  that  the  decision  makers 
have  different  (not  necessarily  the  same)  subjective  probability  meausres  9^  and  9^ 
[for  DMl  and  DM2,  respectively)  on  this  measurable  space  (0,3)  and  let  the  random 
variables  ^x,  y^,  y2)  have  finite  second  moments  under  both  9^  and  9^.  Furthermore, 
we  take  to  be  the  Ranach  space  of  all  measurable  mappings  y^  :  —  U^,  with  the 

additional  property  that  viewed  as  a  random  variable,  has  finite  second 

moments . 

Let  z  *  (x,  yL,  y2>,  and  introduce,  for  each  pair  (y^y^  €  x  F2 ,  the 

quantity 

J1(vl»v2)  ■  r  i-i(x.Y1.ty1),y2(y2))  ^(dz)  U> 

as  the  expected  cost  function  of  DMi  corresponding  to  the  decision  rules  (y  ,y  ;  and 
under  DMi ' s  subjective  probability  measure  9^.  [Here,  we  implicitly  assume  chat 
is  integrable  under  9^-)  We  should  note  at  this  point  that  even  in  team  problems 
(with  2  L2>  the  decision  makers  will  have  different  expected  cost  functions  when¬ 
ever  9 ^  and  9 2  do  not  match,  since  then  a  common  probability  space  will  not  exist. 

Definition  1 

A  pair  of  policies  (y°,v2)  €  ^  x  T2  constitutes  an  equilibrium  solution 
to  the  decision  problem  formulated  above  if 

J1(y1,y2)  -  J1(y1»v2)»  V  vL  6  f L  (2a) 

J2(yI’y2)  -  J2(y1,y2)’  v  V2  6  ~2  (2b.) 

□ 
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i!ur  objective  in  this  paper  is  to  obtain  conditions  on  L.(  and  the 
arobabiUtv  measures  &y  ,  t?,, ,  under  which  the  decision  problem  ionnulated  above  will 
auc.it  a  locally  or  globally  stable  equilibrium  solution.  We  will,  in  particular, 
consider  the  class  oi  problems  in  which  and  are  quadratic  in  the  decision 
variables  and  u^,  and  also  specialize  our  treatment  to  the  case  of  jointly  Gaussian 
distributions.  The  special  case  of  =  9^  has  earlier  been  treated  in  Ba^ar(1975) 
and  Bajiar  (I97b'j,  where  conditions,  independent  of  the  probabilistic  structure  of  the 
problem,  have  been  obtained  for  stable  equilibrium  solutions.  The  present  paper 
discusses  nontrivial  extensions  of  these  results  to  the  case  ^^2'  an<*  on^y  out¬ 
lines  the  method  of  approach  and  the  solution  because  of  space  limitations. 


QUADRATIC  PROBLEMS  AND  GENERAL  CONDITIONS 
FOR  EXISTENCE  OF  A  STABLE  EQUILIBRIUM 
Let  L^  and  L.,  be  defined  by 

Ljix.u.vl  -  ~  ^..u^  +  4  <u-.'D2iu2>  '  '  <w2,F^x>  -  <u1.ci21;2>  <3a) 

i  n  i  2  2  2 

L,(x,u,vi  -  T  <UpD*,u^>  +  -  <u„,u.,>  -  <u,,D“^u^>  -  <J1,Fj^x>  -  <u2,F2x>  (3b) 

L  2 

where  D,,  :  l'.  -  i\  an d  D^:  l! ^  -  L\  are  strongly  positive  operators,  and  we  do 

not  differentiate  between  inner  products  defined  on  different  Hilbert  spaces.  Let 
El[-(tl!y1]  denote  the  expectation  of  a  ^-measurable  random  variable  u(a)  condition¬ 
ed  on  the  random  variable  yt  and  under  the  probability  measure  9^.  The  following 
two  results  now  follow  readily  from  the  analyses  of  Ba$ar  (1975)  and  Ba,ar  (1978). 


Proposition  1 

A  pair  of  policies  (v^.Vj)  €  x  constitutes  an  equilibrium  solution 
to  the  two  DM  decision  problem  with  quadratic  loss  functionals  (3)  if  and  only  if 
it  satisfies  the  pair  of  equations 

y°(yi)  *  d^2  e1  (v2(y2)|y1l  +  f|  e1  [xlyj  (da) 

-  D21  E2  [y°(yi)|y2]  +  F2  E2  [x|y2J  (4b) 

c 

Proposition  2 

A  pair  of  policies  (v?,v?)  €  T.  x  constitutes  a  stable  equilibrium 
(o)  (o)1  *  1  1 

solution  if,  for  all  (y^  .Yj  )  €  x  fj, 

v°(y.)  -  lim  \{n)(yi)  a.e.  ^ 
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Vin)(yi)  ■ 

+  D^FjE^E^xly^ly^  +  F^x^l, 

i.J  ■  1.2;  j»*t;  0-1,2.  .  . 

Furthermore,  such  an  equilibrium  solution  is  necessarily  unique. 

Let  ua  introduce  linear  operators  £^:  -  F^  1-1,2,  by 

£i<y)  -  DijDjiEi[EJ[Y(y1)|yJJ|y1],  j«;  1.J-1.2. 


Then,  Id  view  of  proposition  2,  the  quadratic  decision  problem  will  admit  a  unique 
stable  equilibrium  solution  if,  and  only  if,  and  are  contraction  mappings,  i.e. 
there  exists  a  constant  p,  0  <  p  <  1,  such  that 


||xt!|  ft  ,up  ((<v(yt>.  £1Y(y1»>/<<v(y1).  yCy^H  <p.  1-1,2,  (7> 

Y  e  r1  * 

where  (<  •  >>  denotes  the  inner  product  on  T.  or  F,.  Since  l|£.  ||  <  !l  D^.D^  II 

It  lljjl 

!|E  [EJ[ '  | yj ]  jy^}l|  by  using  a  wall-known  property  of  linear  operators  defined  on 
Banach  spaces,  a  set  of  sufficient  conditions  for  £  to  be  a  contraction  mapping  is 
existence  of  a  pair  (pj.pj),  0  <  Pj.'pj  <  1,  “in  (P1,P2>  <  1.  such  that 

1)  llDljDjlll  <  Pi  (8a)  ’ 

2)  l|EilE^l’]yj)]y1)'||  <  P2  ,  (8b) 


which  is  a  complete  separation  (in  terms  of  sufficient  conditions)  of  the  deter¬ 
ministic  and  stochastic  parts  of  the  system. 

Now,  if  the  decision  problem  is  a  team  problem  with  a  comnon  loss  functional 
L  -  Lt  *  Uj  (which  requires  D22  -  1,  Dj2  -  D**,  f[  -  F*,  F*  -  F^] ,  and  if  L  is  strict¬ 
ly  convex  in  the  pair  (u^,u^),  (8a)  is  always  satisfied  with  <  1.  If,  furthermore, 
the  subjective  probabilities  9^  and  9 2  are  the  same,  the  second  part  of  the  linear 
operator  £^  becomes  a  projection  operator,  thus  leading  to  satisfaction  of  the  second 
condition  (8b)  with  p^  ■  l.  Hence,  for  the  strictly  convex  quadratic  team  problem 
with  9y  =^2>  there  exists  a  unique  stable  equilibrium  solution  (the  so-called  team- 
optimal  solution),  irrespective  of  the  underlying  conanon  probability  distribution  — 
a  result  which  is  already  well-established  in  the  literature  (see  Radner  (1962), 

Bafar  (1978)).  However,  for  team  problems  with  9 ^  /  9^ ,  such  a  result  no  longer 
holds  true,  because  the  second  part  of  the  operator  is  not  necessarily  a  projection 
operator,  i.e.  we  may  not  be  able  to  find  a  p^,  0  <  p^  <  1,  to  satisfy  (8b).  The 
general  condition  then  is  (8b),  which  places  some  restrictions  on  the  probability 
measures  9 ^  and  0  . 
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To  invescigate  this  question  somewhat  further,  let  us  assume  that  *  IK 
-  IK ,  and  that  & ^  admits  a  probability  density  function  (with  respect  to  the 
oesguc  measure)  denoted  pl(x,y^,y2).  By  an  abuse  of  notation,  let  us  denote  the 
i^rginal  and  conditional  densities  that  involve  y^  and  y^  by  p  (y^ )  and  pl(y^|>\), 

”  spectively.  Then  for 

E1lEJh(.y1)|y.)  |yj  “»  VcyJyj)  \Cy1)p(y1|yj)dyidy  .  2 

-  ,7dyjdyi  pl<yj)  pJ(yl|yj)  <y(yi>’  v(yi)>  (9a) 

-  "Fi(y1)  «y(yt),  v(yt)>  p\y1>dyi 

where 

J  Fi(yi)  l  E1{(Pj(yi|yj)/p1(yi|y.)] j } 

i  i  i  (9b) 

*  dy^p  (y  |yt)  pJ(yi|yJ)/p  Cyt I y j ) . 

|  and  in  arriving  ac  the  inequality  we  have  made  repeated  use  of  the  Cauchy-Bunlakowski 
inequality .  Now,  under  the  condition 

(  F.(y.)  <1  Tyx  €  f"1  ,  1  =  1,2,  (10) 

I 

(9a)  can  be  rewritten  as 

IE1(Ej[v(yi)|yj]|yi]"2  <  «Y(yi>,  v(yi>» 

thereby  satisfying  (8b)  with  D^mi.  Also  note  that  it  will  be  sufficient  for  (8a) 
and  (8b)  to  be  satisfied  for  only  one  i  (i*l  or  2),  since  if,  for  example, 

X,  i  <  c  <  1,  (5)  admits  a  well  defined  limit  for  i«l  as  n-®  ,  which  implies  through 
(4b)  that  lira  y-  '  is  also  well-defined.  This  then  leads  to  the  following  Corollary 
to  Proposition  2. 

Corollary  1 

If  conditions  (8a)  and  (10)  are  satisfied  for  1*1  or  2,  with  0  <  ct  <  l, 
the  quadratic  decision  problem  [with  taken  as  Euclidean  spaces]  admits  a  unique 
stable  equilibrium  solution.  0 

Remark  1 

If  F^(y^)  *  1  V  Yj,  €  R **,  and  hence  (10)  Is  always  satisfied.  Q 

JOINTLY  GAUSSIAN  DISTRIBUTIONS  AND 
DERIVATION  OF  EXPLICIT  SOLUTIONS 

To  explore  the  extent  of  the  restrictions  imposed  by  condition  (10)  on  the 
probabilistic  structure  of  the  problem, we  now  further  assume  that  the  random  vectors 
are  Jointly  Gaussian  distributed,  with  mean  zero  and  covariances 
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1-1,2, 


(17) 


.Sure  (L.,L,)  constitutes  the  unique  solution  to  the  Liapunov-type  matrix  equations 


'1 

L1  •  V.C’WC  ■  '  'ft'.’C  •  °  <IS" 


h  -  Vl.-S  'L.^XvC  ‘  '  ° 


J 


Proof 

The  first  part  (i.e.  existence  and  uniqueness)  follows  from  Corollary  1  and 
the  discussion  that  precedes  (14),  also  in  view  of  Che  original  contraction  mapping 
inequality  (7).  The  second  part  follows  by  noting  that  if  (y£°\y2°^)  are  taken  to 
be  linear  in  (y^,y2)  in  (5),  all  terms  of  the  sequence  are  linear  and  hence  the  limit 
(y°,y°)  in  Proposition  2  is  linear.  Denoting  the  coefficient  gain  matrices  by  (L^L^), 
we  readily  arrive  at  (18a)- (18b)  through  straightforward  manipulations.  a 

Conditions  for  existence  of  an  equilibrium  solution  are  of  course  less 
restrictive  chan  those  under  which  the  statement  of  Theorem  1  is  valid.  In  fact,  the 
solution  depicted  in  Theorem  1  will  constitute  an  equilibrium  solution  whenever  there 
exists  a  pair  (L^.l^)  satisfying  (18a)-(18b).  A  sufficient  condition  for  this  (which 
is  less  restrictive  than  (14)  and  (16)),  is  provided  in  the  following  proposition, 
whose  proof  follows  readily  from  the  proof  of  the  second  part  of  Theorem  3  of  Bapar 
(1975). 


Proposition  3 

The  quadratic  Gaussian  decision  problem  of  Theorem  1  admits  an  equilibrium 
solution  (not  necessarily  stable)  given  by  (17)-(18),  if  for  at  least  one  1-1,2, 


I*  max  (rWi>l  «  1 


(19a) 


-1 


(T1  r1 

Vj  yj 


yJyl  y 


,-1 

^  >1  <  1. 

i 


(19b) 


where  XHX  (A)  denotes  the  eigenvalue  of  A  which  is  maxlsaim  in  absolute  value.  0 
For  the  purpose  of  illustrating  the  various  conditions  of  existence  obtain¬ 
ed  above,  we  now  consider  a  family  of  scalar  team  problems ,  with  the  decision  makers 
having  different  subjective  probabilities  on  the  uncertain  quantities.  To  be  more 


specific,  let  D. 


22 


Jli 


l.  D 


12 


21 


d,  |d|  <  1,  “  fj.  fj  -  Fj  -  f2.  and 
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id  If  either 

;  M  "  (20) 

’  ^  ‘  1/k2  (21) 
ki  ■  oftlloft  -  (o^)*  (i  -  °j|Oj)l;  ir1). 

which  are  the  conditions  for  existence  of  a  stable  equilibrium  solution. 

If  we  are  interested  only  in  existence  of  equilibrium  solutions  (not 

necessarily  stable),  the  conditions  (20)-(21)  can  be  relaxed.  The  conditions,  in 

this  case,  follow  from  (19a)-(19b)  to  be  I d I  <  1  and  either  a^oi  <  o?a*  or 
1212  22-11 
ctjCj  -  °292  “^ich  are  always  satisfied,  provided  that  the  loss  function  is  strictly 

convex  in  (u^.u^).  Hence,  the  conclusion  is  that  even  if  the  subjective  probabil¬ 
ities  are  different,  the  scalar  Gaussian  team  problem  with  strictly  convex  loss 

functional  admits  an  equilibrium  solution;  this  solution,  however,  is  not  necessarily 
stable  and  additional  conditions  (such  as  (20)  or  (21))  have  to  be  imposed  to  insure 
stability. 

CONCLUDING  REMARKS 

The  applicability  of  the  general  approach  of  this  paper  is  not  restricted 
co  Che  class  of  quadratic  two-person  stochastic  decision  problems  analyzed  here  in 
considerable  depth,  but  can  readily  be  extended  to  multi-person  stochastic  decision 
problems  in  which  the  decision  makers  have  different  subjective  probabilities  on 
the  uncertain  quantities  governing  the  decision  process.  Extensions  are  also  pos¬ 
sible  to  nonquadratic  loss  functionals  in  which  case  we  investigate  existence  and 
uniqueness  of  locally  stable  equilibrium  solutions.  Because  of  space  limitations, 
we  have  not  been  able  to  discuss  such  extensions  in  the  present  paper. 


Then,  conditions  (14)  and  (16)  are  satlsf 
1,  2  li  2 

“lK  -  °2|o2 


2.  1  2,  1 
°2|02  -  °ll°l 

are  satisfied,  where 


1  O  4  4 
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Abstract 

This  paper  discusses  stochastic  multi-agent  team 
problems  wherein  the  decision  makers  have  different 
probabilistic  models  of  the  underlying  decision  process. 

A  suitable  equilibrium  solution  concept  is  introduced 
for  such  decision  problems  which  exhibit  probabilistic 
multi-modeling,  and  the  existence,  uniqueness  and 
stability  properties  of  this  equilibrium  solution  are 
studied  under  static  information  patterns.  The  special 
case  of  Gaussian  distributions  is  studied  in  some  depth, 
and  some  explicit  equilibrium  policies  are  derived  for 
both  discrete  and  continuous- time  team  problems. 

1 .  Introduction 

A  tear  is  defined  as  a  group  of  agents  who  work 
together  in  a  coordinated  effort,  in  a  possibly  hostile 
and  uncertain  environment,  in  order  to  achieve  a  common 
goal.  In  achieving  this  goal,  the  members  of  the  team 
do  not  necessarily  acquire  the  same  information,  and 
hence  they  have  to  operate  in  a  decentralized  mode  of 
decision  making.  The  scientific  approach  to  formulation 
and  analysis  of  team  problems  have  involved  (i)  a 
quantification  of  the  underlying  common  goal  in  the  form 
of  a  (mathematical)  objective  function  which  is  sought 
to  be  optimized  jointly  by  the  agents,  and  (ii)  a  model¬ 
ing  of  the  uncertain  environment  and  the  possible 
measurements  made  by  the  agents  on  the  environment  in 
the  form  of  a  probability  space  together  with  an 
appropriate  information  structure  il.3,1.'].  The  under¬ 
lying  stipulation  here  has  been  the  existence  of  a 
probabilitv  space  that  is  common  to  all  the  agents,  so 
that  through  their  priors  all  members  of  the  team  "see 
the  world"  in  exactly  the  same  way. 

One  question  that  readily  comes  into  mind  at  this 
point  is  the  robustness  of  such  a  mathematical  model, 
and  the  "optimum"  solutions  it  produces,  to  slight 
variations  in  the  under Iving  assumptions.  In  particular, 
what  if  the  agents  perceive  the  outside  world  in  slight¬ 
ly  different  ways?  Would  the  solution  obtained  under 
the  assumption  of  common  prior  probability  measures 
change  drastically  if  there  are  discrepancies  in  the 
decision  makers  (DMs) *  perceptions  of  the  probabilistic 
description  of  the  outside  world?  In  order  to  be  able 
to  answer  these  queries  satisfactorily  and  effectively, 
we  need  a  theorv  of  equilibrium  for  decision  problems 
in  which  the  D.  have  different  probabilistic  models  of 
the  system;  such  a  general  theory  will  clearlv  subsume 
the  currently  available  results  on  teams  which  use  a 
common  probability  space. 

Consider  a  static  team  decision  problem,  formulated 
in  the  standard  manner  as  in  [7], with  the  only  difference 
being  in  the  underlying  probability  space.  In  partic¬ 
ular,  assume  that  the  DMs  assign  different  subjective 
probabilities  to  the  uncertain  events,  in  which  case 
there  will  not  exist  a  common  probability  space,  thereby 
leading  to  a  different  expected  (average)  cost  function 
for  each  DM.  Hence,  once  we  relax  the  assumption  of 
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existence  of  a  common  probability  space,  the  team  prob¬ 
lem  is  no  longer  a  stochastic  optimization  problem  with 
a  single  objective  functional,  and  we  inevitablv  have 
to  treat  it  as  a  nonzero-sum  stochastic  game  15,5.12). 
Furthermore,  even  though  the  original  team  decision 
problem  with  a  common  probabilitv  space  will  admit  the 
same  team-optimal  solution(s)  regardless  of  the  mode  of 
decision  making  (that  is,  regardless  of  whether  the 
roles  of  the  DMs  are  symmetric  or  whether  there  is  a 
hierarchy  and  dominance  in  decision  making) ,  this  fea¬ 
ture  ceases  to  hold  true  when  there  exists  a  discrepancy 
between  the  perceived  probability  measures.  When  there 
are  only  two  members,  for  example,  two  possibilities 
emerge  in  the  presence  of  discrepancies:  the  totally 
symmetric  roles,  corresponding  to  the  Nash  equilibrium 
solution,  and  the  hierarchical  mode,  corresponding  to 
the  Stackelberg  equilibrium  solution. 

Motivated  by  these  considerations,  we  treat  in  this 
paper  a  general  class  of  two-person  stochastic  team 
problems  which  can  be  viewed  as  static  stochastic 
nonzero-sum  games  with  the  DMs  having  different  subject¬ 
ive  probability  measures.  Adopting  the  symmetric  mode 
of  decision  making,  we  introduce  the  so-called  "stable 
equilibrium  solution"  concept  for  such  problems,  and 
develop  a  general  theory  when  the  objective  functionals 
are  quadratic  and  the  decision  spaces  are  appropriate 
Hilbert  spaces.  Such  a  formulation  includes  both 
finite-dimensional  (discrete)  and  continuous-time 
decision  problems,  and  involves  arbitrary  probability 
measures  which  are,  though,  restricted  2  rearer-  by 
the  conditions  of  existence  and  uniqueness  developed 
in  the  paper.  The  special  case  of  Gaussian  distributions 
is  studied  in  considerable  depth,  and  some  explicit 
solutions  are  obtained  with  appealing  features. 

In  the  next  section  (§2)  we  provide  a  precise  prob¬ 
lem  formulation,  and  introduce  the  solution  concept 
adopted  in  this  paper.  Section  3  develops  general 
conditions  for  existence  and  uniaueness  of  a  stable 
equilibrium  solution,  and  elucidates  the  extent  of  the 
restrictions  imposed  on  the  problem  by  these  conditions. 
Section  4  deals  with  the  special  class  of  Gaussian 
distributions,  verifies  the  existence  of  unique  linear 
stable  equilibrium  solutions  and  provides  explicit 
expressions  for  these  solutions.  Furthermore,  some 
special  cases,  such  as  the  finite-dimensional  and 
continuous-time  problems  are  also  discussed  in  this 
section.  Because  of  space  limitations  we  do  not  provide 
verification  of  the  major  results  in  this  paper;  detailed 
proofs  can  be  obtained  from  the  author  upon  request. 


2 .  Mathematical  Formulation  and  Some  Basic  Results 
Probability  Spaces 

~  m .  m  i  a 

Let  ft  *  xlR  xKfc  =  XxY^xY7,B  denote  the 
Borel  field  of  subsets  of  ft,  and  B^  denote"  the  Borel 
field  of  subsets  of  IR^,  k»n,  m^,  mi.  Let  P  denote  the 
set  of  all  probability  measures  on  (ft,B)  with  finite 
second  moments,  and  for  each  P*=P  denote  the  correspond¬ 
ing  marginal -measures  on  Bn,  8ml  and  bv  ?  ,  and 


P^,  respectively. 


Furthermore,  let  the  collection 


all  such  probability  measures  be  denoted  by  Fx,  Fy 
Pv  ,  respectively.  Then,  for  each  P^P,  the 


or 

and 
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vector  z  *  (x,'  yj)  *♦  taking  values  in  ft ,  becomes  a 

we li-iief ined  random  vector  on  (..,B,P),  and  likewise  x 
is  a  random  vector  on  (Fn,Bn,Px)  and  y^  is  a  random 

vector  on  ;P~l  ). 

*  i 

Here,  x  denotes  the  unknown  state  of  Nature,  and 
denotes  an  observation  of  DMi  (i‘th  decision  maker) 
vruch  is  correlated  with  x.  We  now  choose  two  elements 
out  of  F,  pi  and  P- ,  which  denote  Che  subjective 
probabilit ies  assigned  to  z  by  DMI  and  DM2,  respectively. 
For  technical  reasons,  we  place  some  further  restrictions 
on  the  choices  of  and  P^  through  the  marginals  P*  , 
in  particular  we  assume  that  J 

: i-K  (1).  pj  and  P*-  are  absolutely  continuous 

y2  >1 

[1]  with  respect  to  P»  and  pj  ,  respectively;  that  is, 
using  the  standard  notation  in  probabilitv  theory, 

p1  <  <  p2  ,  p2  «.  <  p1  .  a) 

y2  y2  yl  yl 


CcKiitiCK  ( CJ .  The  nonincreasing  sequence  of  numbers 
k>0}  defined  by 


i 


P1  ({£:  gi(5)  >  k;),  k>0 
yi 


i-1,2. 


(2a) 


where  gi(i)  Is  the  Radon-Nikodym  derivative  [1] 

gi(t)  *  dPj  /  dP1  ,  (2b) 

y.  yt 

has  the  property  that 

a*  -  0  for  all  k  >  K*,  for  some  K*  <  ®  .  (2c) 

x 


conditioned  on  the  random  variable  >*j  ,  and  under  the 
probability  measure  P1,  i.e. 

Ei(u(zj|yi)  ■  ; ^  ( d z ! y A )  (7) 

ft  •  i 

where  the  second  term  of  the  integrand  is  the  condition¬ 

al  probabilitv  measure  derived  from  P* .  Then,  for  each 
pair  ( ► i , »2)67, xT, ,  we  have  a  quadratic  cost  functional 
for  each  DM,  defined  for  DMi  by 

Ji(,l”>2)  “  2  <vi’Yi>i  +  2  1  (YjU),Yj(i))jPy  (dO 

J  J 

-  <y  ,E1(Fjx!y  ]>  -  /  (v  (O.F  x)  P1(dx.Y  ,dC) 

XxY .  J  J  J 

i  i  ) 

-  <Yi.E  lDijyj(yj)|yi)>i  .  (8a) 

which  is  derived  from  a  common  strictly  convex  quadratic 
team  cost  functional.  The  strict  convexity  requirement 
is  met  if  we  choose  D^2  co  satisfy 

,D12D12,l-  ®  ‘  1  •  (8b) 

Equilibrium  Solution 

Since  the  expected  cost  functionals  (8a),  together 
with  the  policy  spaces,  provide  a  normal  (strategic) 
form  description,  regardless  of  the  presence  of  multiple 
probability  measures,  the  standard  definition  of  non- 
cooperative  (Nash)  equilibrium  (5]  (which  we  adopt  as 
our  solution  concept)  remains  intact,  as  given  below. 


decision  and  Policy  Spaces 

The  decision  variable  of  DMi  will  be  denoted  by  u^ 
which  belongs  to  a  real  Hilbert  space  with  inner 
product  (■,•)(.  Permissible  policies  (decision  rules) 
for  DMi  are  measurable  mappings 
m . 

Y*:  R  1  —  U.  (3) 


Definition  1. 


A  pair  of  policies  (y ? . Y°)eP^x"2  constitutes 
equilibrium  solution  if 


Nash  equilibrium 

j,(y?.y°)  £  -MV'P  •  w',ier 


Jr!r 

1^ (y^ » 1 2* 


r'i"2' 

yvv 


J,(y°,y°)  i  J?(y°,vn)  ,  v  ,er 


1  ‘1 
2er2 


(9a) 

(9b) 


satisfying  the  square-integrability  conditions 

•-  ( o  n 2  PJ  (d;)  <  -  ,  j-1,2,  (4) 

Y.  -i 

1 

where  1*11^  is  the  natural  norm  derived  from  (•,*>£. 

Note  that  the  condition  (4)  requires  that  the  permissible 
policies  of  each  DM  have  bounded  second-order  moments 
under  both  probability  measures. 

Let  Tt  denote  the  space  of  all  permissible  policies 
(3)  of  DMi  satisfying  (4),  and  which  is  equipped  with 
the  metric 

df(',,i)  -  <;*Y(i)-8(«)»?  P*  (d£)}1/2,  Y.dST  .  (5) 

Y.  yi 

1 

Then,  we  have  the  following  basic  result  on  the  topolog¬ 
ical  structure  of 

Lemma  1 .  If  the  underlying  probability  measures  satisfy 
tr.e  conditions  (1)  and  (2),  equipped  with  the  metric 
(5)  is  a  Banach  space.  a 


Definition  2. 


A  Nash  equilibrium  solution  is  locally 

stable  if  there  exists  an  e>0  and  an  open  neighborhood 
N ..(>?, yS)CI\x:\  of  such  that  for  all 

<;<o>  .•.«*)*»; 


lim 

k-*® 


(k) 


i-1,2 , 


(10) 


where 


(k)  ,  ,  (k-1 ) 

Y1  *  arg  mm  J^y^.y.,  ') 

’  1 


(k) 


(k-1) 


(11a) 


arg  min  ,i,)  ,  k-1, 2...  (lib) 


Definition  3. 

A  locally  stable  Nash  equilibrium  solution  i  .  ; 

is  (globally)  stable  if  N  l ^ ) -T .  xT „  m  Definition^. 

-  t  i  -  1 


Furthermore,  introduce  the  inner-product  <«,>^  on 
elements  of  ? ^  by 

' Y . .  -  / ( Y ( £ ) ,  8(£)),  ?l  (d£)  (6) 

1  y  1  *  i 

i 

which  makes  7^  a  Hilbert  space.  We  are  now  in  a  position 
to  introduce  the  cost  functionals  for  the  two  DMs . 

Let  D . . :  L j  -  U|  and  F . ;  X  -  U.  (i#j  ,  i,j*l,2)  be 
bounded  linear  operators  with  Dp  *  D*p  Furthermore, 
let  El!-l(z)  yi]  denote  the  mathematical  expectation  of 
a  ^-measurable  random  variable  -A(z)  taking  values  in 


3 .  General  Conditions  for  a  Stable  Equilibrium  Solution 

We  now  obtain  some  general  conditions  tor  ex xstcr.ee 
of  stable  equilibrium  solutions.  Because  M  s race 
limitations  we  simplv  give  the  mam  results  without 
verification. 

Proposition  1. 

.  O  O  — 

A  pair  ot  policies  (  •  p  *  -»  )e-' ,  xT  ,  constipates  an 
equilihriun  solution  to  the  decision  problem  -M  j 
and  onlv  if,  it  satist ies  the  pair  ot  equations 

o ,  ,  _  _  1  f  o  .  ,  1  •  ,  ,  . 

YV  *  2 1 2 r‘  ■  W  v;  *  Y  lx  v:  ’ 


IC9Z 


(13) 


(20b) 


W 


> 2 ( y t )  •  D21E*  ^  I  y2 1  +  F2E~  f x  1  y2 '  ' 

Proposition  2. 

A  pair  of  policies  constitutes  a 

stable  equilibrium  solution  if*  and  5nly  it,  for  all 

(^0)>'2O))erixr2' 


'  i  ( y  i ) 


(k) 

lim  y.  (v^)  in 
k-*°® 


(14) 


where  y) 

.  (k) 


(k) 


)  ‘yj ) !  yj^ J 


is  given  recursively  by 

-  wilEj^k'1)' 

+  O  F  El(Ej (x|y  ]  |y1)  +  F.E[x|y.)  , 

j , i»l , 2 ;  j#i ;  k-1,2, . . . 

Furthermore*  such  a  stable  equilibrium  solution  is 
necessarily  unique. 


(15) 


by 


Let  us  introduce  linear  operators  w-*T^,  1*1*2, 


Sl(y)  =*  Dij D *J E1  [EJ  [ y  (y±)  [  yj  ]  | yA ]  ,  j#i ;  1 ,  j-1 ,2 .  (16) 

Kobe  that  Indeed  maps  r,  into  because  the 
conditional  expectation  EJ ( Dj ^yty^) | y . ]  maps  into  F, 
tj#i)  when  the  probability  measures  satisfy  conditions'^ 
(1)  and  (2),  and  every  element  of  is  square-integrable 
under  P*  and  PJ  . 


Furthermore,  let  us  introduce  the  notation 
to  denote  the  norm  of  a  linear  bounded  operator  $: 
Fi-.fi;  "hich  is  defined  by 

,1/2 


<<5>>i  *  sup  [ »3y>^/<v »y>^ I 


(1/a) 


Y^r, 


r .  (P. 


i>  : 


for  at  least  one  i*l  .2  where  ..  is  introduced  bv  (Sb). 
Furthermore,  a  sufficient  condition  for  (2Gb>  i ^ 


<<? 


ill 


(20c) 


This  result  provides  a  partial  separation  (in  terms 
of  sufficient  conditions)  of  the  deterministic  and 
stochastic  parts  of  the  svstem.  Now,  if  the  subjective 
probability  measures  assigned  to  the  pair  (y^y?)  bv  the 
two  DM* s  are  equivalent,  P ^ | a  becomes  a  projection 
operator,  thus  leading  to  satisfaction  of  (20b)  with 
P^30^*1V  and  thereby  satisfaction  of  (20a)  since  “'1. 
Hence,  as  a  corollary  to  Proposition  4,  we  obtain  the 
following  result  which  is  known  in  different  contexts 
l?, 8*9] . 


Corollary  1. 

For  the  strictly  convex  quadratic  team  problem 
with  equivalent  subjective  probability  measures  assigned 
by  the  two  DM’s  to  (y,  y0) ,  there  exists  a  unique  stable 
equilibrium  solution  tfhe  so-called  team-optimal  solution), 
irrespective  of  the  underlying  common  probability  measure. 

c 

1,  2 

For  team  problems  with  P  ,  a  result  along  the 
lines  of  Corollary  1  does  not  in  general  hold,  because 
the  operator  is  not  necessarily  a  projection 

operator,  i.e.  ie  may  not  be  able  to  find  p^,0<p^<J.,  to 
satisfy  (20c)  (or  (20b)].  Then,  the  general  condition 
is  (19)  [or  the  stronger  one,  (20a)]  which  places  some 
restrictions  on  the  parameters  of  the  cost  functional, 
as  well  as  the  probability  measures  P*-  and  P-.  To 
delineate  the  extent  of  these  restrictions,  we  now 
study  condition  (20c)  somewhat  further  and  obtain  the 
following  sufficient  condition. 

Corollary  2. 


and  ri(3)  to  denote  the  spectral  radius  of  5,  which  is 
defined  by  [13] 

r^(S)  ■  lim  sup  [<-<Sk>>i]^^k  (17b) 

where  5  denotes  the  k'th  power  of  Finally,  let  us 
introduce  the  linear  operator 

Pljl  *  Ei(Ej[- iyj ]yt]  ,  (18) 

which  maps  I\  into  itself.  Then,  the  following  proposi¬ 
tion  whose  proof  depends  on  a  contraction  mapping 
argument,  provides  a  set  of  necessary  and  sufficient 
conditions  for  existence  of  the  unique  equilibrium 
solution  alluded  to  in  Proposition  2. 

Proposition  3. 

When  the  probability  measures  satisfy  conditions 
(1)  and  (2),  the  decision  problem  of  section  2  admits  a 
unique  stable  equilibrium  solution  if,  and  only  if,  there 
exists  ci,  0<o^<l,  such  that 

W  *  ri<°uVi!i>  i  •  <19> 

for  at  least  one  1*1,2.  Q 

The  next  proposition  provides  a  set  of  stronger  but 
more  versatile  conditions  under  which  a  unique  stable 
equilibrium  solution  exists. 

Proposition  4 . 


For  a  given  inequality  (20c)  is  satisfied  if 

(U:  gl(0  !  gj  (l)P^  „  (dn]0>;.  .  ;:;)-0  i-1,2.  \  +  i . 

Vi  Y  yj|yi 

i  1  (21) 

where  g  (£)  and  gJ(n)  are  the  Radon-Nikodvm  derivaties, 
as  defined  by  (2b). 

4.  Jointly  Gaussian  Distributions  and  Derivation  of 
Explicit  Solutions 

When  the  subjective  probability  measures  of  the 
two  DMs  are  equivalent,  one  special  class  of  problems 
chat  admit  closed-form  solutions  is  that  with  Gaussian 
distributions,  and  equilibrium  solutions  in  these  cases 
have  been  shown  to  be  affine  functions  of  the  observa¬ 
tions,  as  documented  in  the  literature  for  quadratic 
team  problems  defined  on  Euclidean  spaces  [7],  quadratic 
nonzero-sum  stochastic  games  on  Euclidean  spaces  [8], 
and  quadratic  cont inuous-t ime  stochastic  team  problems 
[9].  In  this  section,  we  study  possible  extensions  of 
these  results  to  the  case  when  discrepancies  exist 
between  the  subjective  Gaussian  distributions,  as 
reflected  in  the  covariances  of  the  random  vectors 

(y1’y2)  • 

Hence,  let  us  now  assume  that  (y^,V2)  are  Gaussian 
random  vectors  under  both  and  P^,  with 

mean  (y^.yO  =  0  ,  (22) 


A  set  of  sufficient  conditions  for  (19)  to  hold  true 
for  at  least  one  i,  and  thereby  for  a  unique  stable 
equilibrium  solution  to  exist,  is  the  existence  of  a 
pair  >f  positive  scalars  (^^,c-*),  with 

,  (20a) 

such  that 


covariance  (y^.y^) 


>  0 


(23) 


These  probability  distributions  clearlv  satisfy  the 
absolute  continuity  condition  (1)  of  section  2,  and 
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also  satisfy  the  uniform  boundedness  condition  (.2) 
whenever 

,  .-1  .-1 

K  *  :J  -  Z1  >0  ,  j#i  .  (24) 

•  i  *  i 

Using  standard  properties  of  Gaussian  distributions, 
we  obtain 

gi(yi)E-i  [g^(y  )  yi)  *  q1  exp  { -  j  y'D^  >  05) 

where 

D.  *  M*!  .  -  M*!  .  B'1  -  I1  ,  j#i  (26a) 

i  ii  ij  J  ji  v. 


11  12 

\  /  o 

V1  V 

\  ,  f\ 

v,vA 

)  •  1,1 

(26b) 

I1  M1 

J  \ 

21  22 

-  \  “v  y 

"1/2 

B .  »  m)  .  +  W . 

J  JJ  J 

>2  ' 

(26c) 

w.w.  det 

(Ij  )  det  (B_1)  det 

(tf1)]1'2 

(26d) 

x  2 

yi  J 

y 

w . 

l 

*  det  ZX  /  det 

yi  -vi 

(26e) 

For  che  condition  of  Corollary  1  (i.e.  (21))  to  be 
satisfied  for  a  given  finite  o^,  it  is  necessary  that 
expression  (25)  be  uniformly  bounded  in  y^,  which  holds 
true  if,  and  only  if, 

D.  >  0  .  (2?) 

Under  this  restriction,  condition  (20d)  becomes 

/i <  o .  , 

M  —  i 

and  referring  back  to  Proposition  4,  we  obtain  a 
sufficient  condition  for  existence  of  a  unique  stable 
equilibrium  solution  to  be 

Id.  .0* ,»*  O1  <  1  ,  (28) 

i;  ij  i  ■ 

tor  at  least  one  1*1,2, 

In  order  to  complete  the  solution  of  the  Gaussian 
decision  problem,  let  us  further  assume  that  x  is  also 
Gaussian  distributed,  with 

mean  (x)  *  0  , 


covariance  (x,y.,y0)  *  cov(x,v)  *  Z 


S  I1 


Here,  l_i :  2R  -  U  are  bounded  linear  operators,  con¬ 
stituting  the  unique  solution  to  the  linear  operator 
equations 

,  I'1  •  -"I 

L.y  -  D  .3  .LI1  :J  rl  ;l  v. 

11  U  1  Vj  vj  >•.>'!  >’i  1 

.-1  .  .-1 

-  D. ,F.fJ  I-*  Z‘  zl  y. 

IJ  j  XV  .  V  V  V  .  V  .  •  1  (ill 

J  J  '  J  •  J- i  •  l 

-1  m . 

-  F.21  Z1  y,  -  0,  Vv.eiR  l,  1-1,2. 

i  xv .  v .  i  i 

i  i 

In  the  statement  of  Theorem  1,  the  condition  (29a) 
places  some  restrictions  on  the  second  moments  of  the 
underlying  distributions  (in  case  a  discrepancy  exists), 
which  may  however  be  relaxed  if  we  are  willing  to  consider 
equilibrium  policies  in  a  more  restricted  space,  '/.ore 
specifically,  satisfaction  of  (29a;  ensures  that  regard¬ 
less  of  what  initial  set  of  policies  the  DMs  start  the 
infinite  recursion  (19)  with,  every  element  of  this 
series  is  well-defined,  and  under  (29b)-(29c)  it  will 
converge  to  a  unique  limit  which  is  affine;  in  other 
words,  even  if  the  DMs  start  with  nonlinear  policies, 
the  end  result  will  be  an  affine  equilibrium  solution. 

The  condition  (29a)  is  restrictive,  because  we  require 
(without  imposing  any  constraints  on  the  policy  spaces) 
the  series  generated  by  (19)  to  be  well-defined  even 
with  nonlinear  starting  conditions.  However,  if  we 
restrict  ourselves  to  affine  policies  from  the  outset, 
under  Gaussian  distributions  elements  of  the  series  (19) 
will  always  be  well-defined  (without  requiring  (29a)) 
and  will  converge  to  the  equilibrium  solution  provided 
that  (29b)-(29c)  holds  for  at  least  one  i*l,2.  This 
line  of  reasoning  then  leads  to  the  following  result: 

Proposition  5. 

Let  T*  be  the  class  of. all  linear  policies  in  the 
form  (30), "with  :  IR’"1*  -  l'1  a  bounded  linear  operator, 
i*l,2.  On  rfxfZ,  the  statement  of  Theorem  1  is  valid  even 
if  (29a)  does  not  hold  true. 

Fzkz  te-d trier. cioKj  l  '.jizior.  Sr  zc'ec 


Then,  the  following  theorem  summarizes  the  complete 
solution . 

Theorem  1. 

Let 

(i)  Wj  0  ,  1*1,2,  (29a) 

and  the  following  conditions  hold  for  at  Least  one 

1*1,2: 

(ii)  D<  >  0  (29b) 

(iii)  <  1/q 1  ,  i , j  =  1 , 2 ;  jtl.  (29c) 
Then,  the  quadratic  Gaussian  decision  problem  formulated 
in  this  section  admits  a  unique  stable  equilibrium 
solution  (.?.•-),  where  u?  =  >9( y.)  are  linear  in  y^, 
and  are  given  by 


(D  .D  , ) i 
max  ij  j  i 


*.  det(Bj)  det 


We  have  so  far  obtained  condit ions  for  existence 
and  uniqueness  of  stable  equilibrium  solutions,  so  that 
the  recursive  relations  ill)  converge  for  all  possible 
starting  points,  either  in  2^x2,  or  in  2^x2^.  If  we  are 
interested  only  in  existence  of  equilibrium'solutions 
[cf.  Definition  1],  however,  the  corresponding  conditions 
will  clear lv  be  less  restrictive.  For  a  further  elabora¬ 
tion  on  this  point,  consider,  for  example,  the  class  of 
decision  problems  wherein  all  linear  operators  are 
matrices.  A  stable  equilibrium  will  exist,  in  this  case, 
under  the  conditions  i replacing  (29c) 

lx  (D  D det (B  )  detll1  )/[detf2i  )  de t ( 2  1 ) j  ’  ] 1 “ 
max  i j  j i  j  y  ^  y 

i  , j -1 . 2 ;  Mi,  (32  i 

where  imax  (A)  denotes  the  eigenvalue  of  the  square 
matrix  A  which  is  maximum  in  absolute  value.  Further¬ 
more,  this  unique  stable  equilibrium  solution  will  be 
given  by  (30),  where  L^  is  now  a  matrix  (of  appropriate 
dimensions)  satisfying  (31)  with  the  multiplying  y^’s 
left  out.  Such  a  solution  is  definitely  also  an 
equilibrium  solution  (cf.  Definition  1),  and  as  such  it 
exists  whenever  (31)  is  solvable.  A  sufficient  condition 
for  solvability  of  (the  finite-dimensional  version  of) 
(31),  which  is  less  restrictive  chan  (32),  is  given 
below  as  Proposition  n. 

Proposition  6. 

When  the  decision  spaces  are  finite  dimensional, 
the  quadratic  Gaussian  decision  problem  admits  an 
equilibrium  solution  \  not  necessarily  stable'  given  ;,v 
(30) -(31),  if,  for  at  least  one  1=1.2,  the  ;  •>  11  owing 
inequality  holds 
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(33) 


(d.  d. ,)\\\  a  r1  i)|*  1.  (33) 

max  ij  ij  *  max  y  v.  v.  v .  y  y  1 

1*3  'J  'J  i  '1  □ 

For  the  purpose  of  illustrating  the  various  condi¬ 
tions  of  existence  (and  uniqueness)  obtained  above  and 
in  the  preceding  sections,  let  us  now  consider  a 
family  of  scalar  quadratic  Gaussian  team  problems,  with 
the  DM’s  having  different  subjective  probabilities  on 
the  uncertain  quantities.  To  be  more  specific,  let 


^<1.  F1  =  tv 


n*m^*rm2*^»  anc* 


2  2 
*  n ab>e  ,  vab>c  . 


Firstly,  condition  (1)  of  section  2  on  absolute 
continuity  of  various  measures  is  clearly  satisfied, 
because  all  probability  measures  are  Gaussian.  Secondly, 
condition  (2)  of  section  2  is  satisfied  if,  and  only  if, 
both 

0<u<l  ,  0<n<l  .  (35) 

This  is  condition  (29a)  of  Theorem  1.  For  condition 
(29b),  we  evaluate  : 

D,  -  (uab~j2)  (1-u)  [ua2-c2-(l-u)ai>]/ 

22  2  (36a) 

/ta[y  a  -(l-uHuab-o  )]}  0 

0,  ■  (nab-e2)  (1— n )  [n£2-s2-(l-n)a£]/ 

22  2  (36b) 

/{b[nb‘ -(l-nHncb-e  )]}  >  0  , 

and  require  either  (36a)  or  (36b)  to  be  satisfied. 
Finally,  condition  (29c),  whose  counterpart  in  this 
context  is  (32),  dictates  either 

•— t ~  ,  Jt" i  “ <  n  [u‘-2-(l-’j)  (udi-c2)  ]  (37a) 


Finally,  if  our  interest  lies  only  in  the  existence 
of  an  equilibrium  solution  (not  necessarilv  stable),  the 
condition  that  replaces  (29a)-(29c)  is.  (33)  which,  in  our 
case,  is  independent  of  i  and  reads: 

i , 

\d'f  \je/jb\  '■  1.  <1H) 

This  condition  is  clearly  much  less  restrictive 
than  (35) -(37),  ancj  is  satisfied  whenever  a  >  ,  •: .  and  l  >  1  ;*  \ 
which  are  reasonable  restrictions  in  (39). 

In  finite •dimensional  Decision  dtaoes 

As  another  illustration  of  Theorem  1,  for  infinite¬ 
dimensional  decision  spaces,  we  consider  in  the  sequel 
a  class  of  stochastic  Gaussian  team  problems  defined  in 
continuous  time.  More  specifically,  let  i  (0  ,T)  . 

the  Hilbert  space  of  all  scalar-valued  Lebesgue-integrable 
functions  on  the  bounded  interval  [0,T],  endowed  with 
the  standard  inner  product  /  u ( t ) v ( t )dt ,  for  u,v€Co. 

Furthermore,  let  Yi*Y9  =  F,  and  the  Gaussian  statistics 
have  zero  mean,  and  variances  as  given  in  (39).  Let 
D12=D21  be  the  Fredholm  operator 


D.  .  u  1  }k( t ,s)u(s)ds 


^b*!i!2'  u(-ri2-(l-n) (nap-e2) )  (37b) 

provided  that  the  terms  on  the  right-hand-side  are  pos¬ 
itive  (if  not,  then  the  inequalities  will  accordingly 
change  direction) . 

The  set  of  values  for  z,b,c,e, u, n  that  satisfy 
(3i;-(37)  is  clearly  not  empty.  To  gain  some  further 
insight  into  these  conditions,  let  us  consider  the 
ciass  of  team  decision  problems  in  which  the  discrepanc¬ 
ies  between  the  DMs 1  perceptions  of  the  variances  of 
different  Gaussian  random  variables  is  relatively  small, 
that  is  there  exist  sufficiently  small  c^>0  and  Ct>0 
such  that  u-l-c^,  n»l-e->,  and  furthermore  z-j,  and  |<r| 
is  considerably  smaller  than  both  a  and  b.  Note  that, 
when  conditions  (35)-(37)are  all  satisfied 

(note  that  |d|<l  because  of  strict  convexity  of  the 
objective  functional),  regardless  of  the  relative 
magnitudes  of  e  and  2.  Hence,  when  the  discrepancy  is 
only  in  the  perceptions  of  the  correlation  between  y^ 
and  y->,  the  scalar  quadratic  Gaussian  team  problem 
always  admits  a  stable  equilibrium  solution.  Now,  for 
nonzero,  but  positive,  and  sufficiently  small  Cj,  the 
dominating  term  In  (36a)  will  be 

2  2  2  2  3 

)(u a  )/u  a 

which  is  positive,  in  view  of  (34)  and  the  initial 
hypothesis  that  !a/d|>>l.  Likewise,  Dj  Is  positive, 
whenever  Oxcj'*!’  and  |i>/e|>>l.  Furthermore,  given  a 
7,  0<<T<1,  we  can  always  find  and  t,,  both  in  (0,1), 
so  that  both  (37a)  and  (37b)  are  satisfied  whenever 

ji|<cT.  Hence,  the  conclusion  is  that  when  the  deviations 

of  the  perceptions  of  the  DMs  from  the  common  Gaussian 
probability  measures  are  incremental  (and  satisfying 
(35);,  the  linear  equilibrium  solution  of  the  Gaussian 
scalar  team  problem  retains  its  stability  property  (hut, 
of  course,  at  a  different  (possiblv  close.  In  norm) 
equilibrium  point). 


where  K(t,s)  is  a  continuous  kernel  on  0^t,s<T,  and 
and  finally  let  F,=f^(t),  i*l,2,  which  are  continuous 
functions  on  [0,TJ. 

Now,  conditions  (29a)  and  (29b)  depend  only  on  the 
probabilistic  structure,  and  are  therefore  again  given 
by  (35)  and  (36),  resoectively .  For  (29c), 
however,  we  have  to  obtain  the  counterpart  of 
(37),  by  simplv  replacing  'ij  with  the  norm  of  the 
operators  and  0*tDj->,  respectively.  Since 

*  T“ 

D  u  *  <  K(s, t)u(s)ds , 

0 

* 

the  self-adjoint  operator  D.  ,Dl;>  is  given  by 
*  T  T  i-  12  T 

D.,D.7  u  *  /  /K(t,T)K(s,x)u(s)ds  =  f-.'(t , s)u(s)ds , 

u  k  oo  o 


,D12D12 


.*(  ( t  ,  S  )  =  /K(t,T)K(s,T)dT. 

o 

,  ,2  i  / 1 

\  *  \jj  A'( t , s)  j  “dtds ) 

00 

2  t  T  T 

*  /  ]  jA(t ,s)u(sjds I  dt 
0  0 

T  T  ,  T 

<_  J  |  “ds]  [/  ;u(s):“ds]dt 

0  0  0 


where  the  second  step  follows  from  the  Cauchv-Buniakowski 
inequality.  Hence, 

"D12D12"  ’ 

* 

and  because  of  svnmetrv  is  a*-so  bounded  in  nom 

by  the  same  quantity.  This  then  leads  to  the  following 
counterpart  of  (37):  A  sufficient  condition  for  satis¬ 
faction  of  (29c)  is  either 

<  n(u2JZ2-(  1-u)  (uj£--?to)  ] 

i  y  *>  ■>  f 

( nai? Ip  > 

provided  that  the  terns  on  the  right-hand-side  are 
positive,  where  X  is  defined  by  ( ) - < -.Oh ) . 
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under  (j.)  and  diner  (Jon)  and  (m1j)  or  (Jbb) 

.i:.J  v  .  1  ■  ,  tn«‘  cent inuous- 1 irae  static  decision  problem 
i.’rr.jL. tted  abovt.  autr.it  s  a  unique  stable  equilibrium 

in:  i.v: .  ana  ti.is  solution  is  given  by  (from  Theorem  1): 


• "  v  t ,  v)  =  k  .  ( t  *  v  .  ,  i  *  l ,  2 , 

i  l  l  i 


(42) 


nere  k^ic)  are  continuous  functions  on  (0,T],  satisfying 

T  ■*  T 

.  i.t)  -  t,— >  .  •  u.s)k1(s)ds  -  (o*y  j/ai-);K(t,s)r‘l(S)ds 


<-’xyl/i>fX(e>  *  0 


(43a) 


k,(c)  -  ITT)  ;  (s,  t  )k,  (s>ds  -  (r  -•/Jr); K(s,t)£.(s)ds 


(43b) 


Note  that  k^(t'  above  stands  for  operator  in  (31), 
and  we  have  already  shown  that  a  unique  solution  to 
both  (43a)  and  (43b'  exist  in  C 0  (0,T],  under  (35)  and 
either  (36a)  and  (-..:)  or  (36b)"and  (41b),  and  this 
solution  is  also  continuous. 

Finally,  if  our  interest  lies  only  in  the  existence 
of  a  unique  linear  equilibrium  solution  in  the  class  of 
Linear  policies  (not  necessarily  stable),  the  required 
condition  is  unique  solvability  of  the  integral 
equations  (43a)-(43b),  for  which  a  sufficient  condition 

is  i 6 ] 

ies/izb)  \  <  1  (44) 


general  Hi Iber t-space  rrameverk  adopted  m  this  paper 
and  the  general  solution  presented  in  section  4 
(Theorem  1)  applies  to  other  models  also,  such  as  the 
one  similar  to  the  cont inuous- t ime  team  problem  treated 
in  (9)  but  with  the  DMs  having  different  probability 
models.  It  is  expected  that  some  explicit  results 
(closed-form  solutions)  can  also  be  obtained  in  this 
case,  but  this  point  has  not  been  pursued  in  this 
paper  and  is  left  for  future  research. 

One  source  of  motivation  for  the  research  reported 
in  this  paper  has  been  (as  discussed  in  section  1)  the 
desire  to  investigate  the  sensitivity  and  robustness  of 
team-optimal  solutions  (in  stochastic  teams)  to  indepen¬ 
dent  variations  in  the  perceptions  of  the  DMs  of  the 
underlying  probability  space  (and,  in  particular,  the 
probability  measure).  The  analysis  of  this  paper  indeed 
provides  a  framework  for  such  a  study  when  the  roles  of 
the  DMs  are  symmetric,  since  an  equilibrium  theory  (of 
the  "Nash"  type)  has  been  established  (in  terms  of 
existence,  uniqueness  and  derivation  of  stable  solutions) 
within  an  "c-neighborhood”  of  the  team-optimal  solution. 
Some  further  work  is  needed  in  order  to  determine  the 
"satisfiability"  of  the  several  existence  conditions 
obtained  in  the  paper,  when  the  region  of  interest  is 
an  e-neighborhood  of  a  common  probability  space,  and  to 
further  extend  the  analysis  to  an  investigation  of 
sensitivity  and  robustness  properties  of  team  solutions 
(obtained  under  the  stipulation  of  existence  of  a  common 
underlying  probability  space)  in  this  e-neighborhood. 

Extensions  of  the  analyses  of  this  paper  to 
intrinsic  nonzero-sum  games  with  symmetric  and  asymmetric 
modes  of  decision  making  under  different  subjective 
probabilities  can  be  found  in  (10)  and  (11],  respectively. 


where  \  is  defined  by  (40b). 

5 .  Concluding  Remarks 

In  the  preceding  sections,  we  have  developed  an 
equilibrium  theory  for  two-person  quadratic  team 
decision  problems  with  static  information  patterns, 
wnerein  the  decision  makers  (DMs)  do  not  necessarily 
have  the  same  perception  of  the  underlying  probability 
space;  that  is,  our  formulation  allows  for  discrepancies 
ir.  the  way  different  DMs  perceive  the  probability  space. 
As  indicated  earlier,  when  such  discrepancies  exist 
team  problems  have  to  be  analyzed  in  the  framework  of 
nonzero-sum  games,  and  in  such  a  framework  the  Nash 
solution  concept  is  the  most  suitable  equilibrium  con¬ 
cept  if  the  DMs  occupy  symmetric  (non-hierarchical) 
positions  in  the  decision  process.  When  the  equilibrium 
policies  satisfy  the  further  requirement  of  stability, 
cnis  solution  becomes  very  appealing  because,  in  order 
to  arrive  at  equilibrium  (as  a  consequence  of  an 
infinite  number  of  response  iterations),  each  DM  does 
not  have  to  know  the  subjective  probability  measures 
oerceived  by  the  other  DM,  but  has  to  know  only  the 
policy  adopted  by  the  other  DM  at  the  most  recent  step 
of  the  iteration.  Under  the  stipulation  that  each  DM 
chooses  his  policy  at  each  stage  by  responding  optimally 
to  the  other  DM’s  policy,  we  have  derived  in  this  paper 
a  set  of  conditions  that  insure  existence  of  unique 
limits  to  such  iterations,  and  thereby  existence  of  a 
unique  stable  equilibrium  solution. 

The  analyses  of  section  4  have  shown  that  when  the 
underiving  probability  distributions  belong  to  a  Gaussian 
class,  conditions  of  existence  and  uniqueness,  as  well 
as  the  stable  equilibrium  solution  itself,  can  be 
obtained  explicitly,  with  the  latter  being  affine  in 
the  available  static  information.  For  two  special  cases, 
namelv  when  the  decision  spaces  are  finite  dimensional 
or  when  the  decision  problem  is  defined  in  continuous- 
tine  with  a  specific  cost  structure,  we  have  obtained 
analytic  expressions  for  the  gain  operators,  and  have 
also  further  delineated  the  existence  conditions.  The 
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Abstract 

This  paper  develops  an  equilibrium  theory  for  two- 
person  cvo-criceria  stochastic  decision  problems  wich 
j  scaclc  information  pactema  and  an  asymmetric  mode  of 
*  decision  making,  wherein  the  decision  makers  have  dif¬ 
ferent  probabilistic  models  of  the  underlying  process . 
The  objective  functions  are  quadratic  and  the  decision 
spaces  are  general  inner-product  spaces.  Firstly,  a 
3  sec  oc  sufficient  conditions  is  obtained  for  existence 
and  uniqueness  of  Scackelberg  equilibria,  and  a  uniform¬ 
ly  convergent  iterative  scheme  is  developed,  whereby 
the  equilibrium  poliev  of  che  Leader  can  be  obtained 
i  by  evaluating  a  number  of  conditional  expectations. 

^  When  che  probability  measures  are  Gaussian,  che 

equilibrium  solution  is  shown  to  be  generlcally  non¬ 
linear.  wich  the  linear  structure  prevailing  only  in 
some  special  cases  which  are  delineated  in  che  paper. 

i 

1.  Introduction 

lu  we  iave  presented  a  theory  of  equilibrium 
for  team  decision  maxing  when  t.ne  decision  makers  have 
1  iiffetanc  perceptions  of  the  underlying  prooabiliev 
measures  and  when  the  mode  of  decision  making  is 
ivmmetric.  These  results  cave  also  been  extended  to 
cocoas  tic  nonzero-sum  ;ames.  again  una«r  the  ivmecric 
none  of  decision  making,  in  [II-  It  has  been  snown  in 
these  two  references  chac  a  stable  equilibrium  solution 
exists  under  a  reasonable  sec  of  conditions  which  place 
some  restrictions  on  the  probabilistic  and  nonprob- 
aoillscic  parts  of  the  description  of  the  decision 
problem.  Furthermore,  che  solution  was  obtainable  by 
successive  approximation,  which  was  shown  to  lead  to 
if  fine  equilibrium  solutions  when  the  underlying 
statistics  were  Gaussian. 

As  it  has  been  pointed  out  in  [11,  even  though  che 
mode  jf  decision  making  is  irrelevant  in  scochasclc 
team  problems  wich  a  common  probability  space  for  both 
'Ms ,  it  becomes  an  important  factor  in  the  derivation 
of  equilibrium  solution  when  there  is  a  discrepancy  in 
the  DMs'  perceptions  of  the  probability  measures. 

Hence,  even  in  team  problems,  derivation  of  equilibrium 
solucion  requires  separate  (and  possibly  different) 
analyses  in  the  two  cases  corresponding  to  symmetric 
and  asymmetric  modes  of  decision  making.  Having 
provided  a  complete  analysis  of  the  former  in  [1]  and 
[1!,  here  we  direct  attention  to  an  investigation  of 
the  nature,  existence,  iniquenesss  and  derivation  of 
equilibrium  solution  under  the  latter  mode.  Towards 
this  end,  we  scare  wich  a  more  general  class  of  prob¬ 
lems,  as  in  '21,  which  subsumes  scochasclc  team  prob¬ 
lems  as  a  soeclai  class,  and  we  formulate  the  hierar¬ 
chical  decision  problem  in  general  Hilbert  spaces  and 
unaer  a  general  orobabilistic  description.  As  for  a 
solution  concent  we  adopt  that  if  Scackelberg  equilib¬ 
rium  whicn  is  che  nacural  counterpart  of  che  ^Mash) 
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equilibrium  solucion  of  [1]  in  the  present  context. 

A  problem  formulation  and  precise  delineation  of  a 
set  of  conditions  which  lead  to  a  meaningful  description 
are  provided  in  Section  2,  where  we  also  rely  on  the 
more  detailed  exposition  of  [1!  and  [21.  A  derivation 
of  unique  Scackelberg  equilibrium  solution,  proof  of 
existence  and  uniqueness,  and  some  elucidation  of  che 
required  conditions  [cf.  tcn&i-icr  „']  occupies  us 
through  Section  3.  Perhaps  a  most  surprising  by-product 
of  this  analysis,  as  contrasted  with  [lj ,  is  in  the 
characterization  of  che  equilibrium  solution  for  che 
special  case  of  jointly  Gaussian  distributions:  The 
solution  is  generically  nonlinear,  ind  contains  summation 
of  csctbs  which  involve  products  of  linear  functions  of 
measurements  with  exponential  terms  (whose  exponents 
are  quadratic  in  the  measurements) .  A  full  description 
of  che  solucion  for  this  class  for  Gaussian  distribu¬ 
tions/  is  given  m  Section  -  mere  .c  is  ilso  snown 
that  for  some  special  cases  tne  solution  is  still  affine 
in  che  measurements.  One  of  these  cases  corresponds  to 
the  special  class  of  Gaussian  distributions  wnere  che 
DMs’  perceptions  of  the  marginal  orobabiliev  distribu¬ 
tions  of  eac.n  >cners  measurements  .re  .uentccai.  cue 
tne  correlations  oeeween  tne  measurement  variuoi.es 
could  be  perceived  differently.  The  section  also 
concains  some  discussion  on  scochasclc  team  problems, 
created  as  a  special  case. 

The  paper  ends  with  the  concluding  remarks  of 
Section  5.  For  proofs  of  some  of  the  results  given 
this  paper  che  reader  is  referred  to  the  more  complece 
version  [3] . 

2.  Mathematical  Formulation  and  Basic  Definitions 

Problem  formulation  wich  che  exceocion  of  tne 
definition  of  equilibrium,  is  analogous  to  chac  of  [lj, 
and  hence  the  presentation  below  will  be  brief. 

Using  che  nocacion  of  (1|,  we  let  x  (unknown  state 
of  Nacure)  be  a  random  vector  on  (Rn,  8n,  Px) ^ 
(observation  of  DKi)  be  a  random  vector  on  (R‘  ,  B'"*, 

Pv^),  and  P  be  a  class  of  probability  measures  for 
z  *  (x‘,  v£,  y-?)'.  , Choosing  two  elements  out  of  P  (to 
be  denoted  and  ?"),  we  assume  chat 

1  2  2  l 

Ccnd'.:".cy  ...  ?v  ■'<  ?  ,  P  •  <  (absolute 

•2  2  1  '1  concinuitv)  c 

' :nd~  The  Radon-Ntkodym  derivative 

?*■(;)  -  dPJ.  /dP*  ,  :r i  i  1! 

■'  i  '  i 

is  uniformly  bounded  a.e.  ,  i*l,2. 

The  decision  variable  at  DMi  belongs  to  a  real 
separable  Hilbert  soace  U,  with  inner  aroducc 
Permissible  policies  (decision  rules)  for  DMi  are 
ueasurable  mappings 

■3d 


■  ,  :  31  *  -  If. 


satisfvlng 


(7) 


v  <  ,cn;  p l  (d-,)  -  -  .  j-1.2. 

'  i  “  •  i 


let  3«  :h*  spaca  of  ill  such  policies ,  with  :ha 
probaoility  measures  satisfying  7sKzi:ior.3  1,  and  l’;. 
Then,  it  is  a  Hilbert  space  under  the  inner  product 

•  /  (>(0.3(0),  WO  .  (2) 

Y .  •  i 


Sow  finally,  let  us  Introduce  a  cost  function  for  each 
DM,  by 

V  "  y  "Wi  *  :  /  <V*>-  DjViu,)JpJi(d5) 


Y. 

J 


-  .eVScIv,  ]»  -  !  (y  (O.F^x)  Pi(d*.Y  dt) 

-  1  v„V  J  J  J  1 


XxY. 

J 


•'  l'E  iDij  j^i'^i'-l  <3) 

where  D7.,,  F^,  FE,  0^(  are  linear  operators.  X^R^, 


,  and 


E1r.(z)  y1!  a  ;  .(*)pf  (dz 1 y, ) 

■ i  1 


Here  we  assume  that  the  node  of  decision  making  is 
is’-aietric  with  -me  DM  dominating  the  decision  process. 

--  -here  is  such  a  hierarchy,  which  pennies  one  decision 
hOKer  ■ say  3M1'  co  announce  and  enforce  his  policy  on  the 
other  DM,  the  relevant  equilibrium  solution  is  the 
.eader-follower  (Stackelberg)  solution  defined  below. 

lafimtion  A  pair  jc  policies  i-.f . .  «>,T.x.* .  ;on- 
seituces  a  .eader-f ollower  otacicelbergj  equilibrium 
solution  to  the  decision  prooiea  fonnulaced  above,  and 
with  unique  follower  responses,  if  there  exists  a  unique 
-.aooing  T>:  sacisfving 


a-  D^EHx  y:l)2  ?ly,(d0  *  ■'v-E1[F3xly1J>l 

+/  (D;,:2!i(y,),y,)  +  F‘E2[x-y  J ,p;x),P1(dx,Y,,d;) 

XxY,  1  1  *  *  1 

-  <7.E1I0J20;iE:[vty1)!y;j;y1] 

+  E^D^tx  y2]ly1)>l. 

where  we  have  deleted  the  subscript  1  in  v,  in  order  to 
simplify  the  notacion.  How,  since  is  a  linear  space, 
and  J  is  the  sum  of  terms  homogeneous  of  degree  zero, 
one  and  two  (maximum),  any  minimizing  solution  -€7^ 
will  have  to  satisfy 


AJ(v;h)  -  J(y+h)  -  J(v)  •  5J(-;h) 
+  J*3(v;h)  >  0  Wi67,, 


(8) 


where  j  J ( v ; h)  is  the  Gateaux  variation  of  J (v )  of 
degree  i.  Extensive  manipulations,  details  of  which 
are  givan  in  [3],  lead  to  che  following  expressions  for 
SJ  and 


5J(y;h) 


<h.y>1  -  /  (h(y^) ,  (Sv)(y1))1?J  (dy:) 
Y1  '  l 

l  (My,).!^))^  (dyL) 

Y1 


<9) 


i“J(  .;h)  •  7  'h.h>,  -  t  (h(v . )  .g* ( y ,  )E* ■  g"  y  , jj)7, D^ , 

-.0) 

.  d;,  E”  ’h(  ■"  j  y,j;y.  !).?*  >dy.)  -  h.DI^Ij?.  .h>. 


where  i:  7.-*^  and  ;S7.  are  defined  by 


. ;  > 


V( 


•  ,)«:.  x.*. 


-  S',  ;  (  V, 


1 37,D' , : 


.  i  y .  ; 


md  furthermore 


(5) 


).  General  Sufficient  Conditions  for  a  Stackelberg 
Equilibrium  Solucion 

In  this  section  we  obtain  some  general  sufficient 
conditions  for  existence  of  a  Gcackelberg  equilibrium 
solution,  and  provide  a  complece  characterization  oc 
the  solution.  Subsequently  we  consider  some  special 
cases  wich  some  further  structure  imposed  on  che  cost 
functionals  and  che  probability  measures. 

Firstly  we  obtain  an  expression  for  DM2's  re¬ 
action  T2:  as  defined  by  (4) ,  using  Proposition 

1  of  [1!  : 

*  s(y,)  ■  3’iE'1  'i(:/i'  4/->‘  ~  F“E"[xiy,l  (6) 

where,  for  notation.  we  refer  che  reader  co  [L].  lloce 
chac  che  uniqueness  assumption  of  Dec.  I  is  satisfied 
in  chis  case.  Hence,  che  derivation  of  che  leader's 
Scackeiberg  policy  ?€~.  involves  (in  view  of  (5))  che 
minimization  of  J ^  over*’."^  after  . 9,  given  by  (.6)  is 
subscicuced  in.  this  subscicucion~yields 


*  0:iD220:igl’(yi)E:;®:(y:)E:[v(yi)  y:]  yi] 

—  ( y l )  *  F^Etxiy^!  -  D“, Di.?7gL(y^)E* tg" (y ,) E* [x  y,i 

-D3;F;E1[E2[x,y:i;y1]  -lib) 

* 

+  D“2F;g1(y1)E*[g'(y;;;Ei:x!y:]  y,  j  . 

,  :  U.-il,  is  a  Linear  operator  given  bv 

i.  •  1.  L  L 

P,  ,  (y  )  *  (y,)  !  v^j  ,  y,  j  ,  |12) 

is  che  space  of  y .-measurable  random  variables  caking 
values  in  U^,  and  gMt)  are  che  Radon-Nikodym  deriv- 
acives  given  by  (1).  Hoce  chac  is  relaced  co  ?.  , 

defined  by  (18)  in  [L]  bv 

Pl;l[v(7l)2  *  (P:  i'),;/i) 

where  che  laccer  (which  is  i  naopine  from  ",  mco 
has  been  used  in  (10)  and  will  also  be  used*- in  che 
sequel  whenever  needed. 

N’ow,  since 
:  J  (■'  ,h)  *  0 

■Z']<  ,.n)  ;  ■} 

a  Scackeiberg  solution  will  e:tisc  r'or  che  leader 

if.  ind  onl"  if. 


(3)  is  aLso  equivalent  co 

vhe:,  j 

l 

vhe-. 


O 


It)  (10)  Is  nonnegacive  definite,  and  (from  (9)): 
(It)  .(v^)  -  i*-*)(*  )  -  ity^j  •  0,  a. a.  P2  .  (Id) 

Since  ch«  firsc  of  chese  conditions  do«s  not  depend  on 
.  tne  ootimal  solution  is  solelv  determined  bv  (Id), 
a 


'V  -  yL) 


* 

•*  3i:o^2«1(y,)E*(g'(y,)E‘l -(y,) '»,] ly, 


*  D2iF;gl(yi,E2:*2(y:,Eltx  v:1'yi1 

*  3:iD::D2i®ltyi,EZtg'(y2JE~l  y(!V  ,y:1  yi* 


do) 


F^E^x.v^  -  D;iDi2?:g1(y1)E2!g:(y;,)E2[x  v,]^] 


D2,F2E1(E2[x.y,]  v  ] 

i.m  m  •  1 


wnere  we  have  utilized  the  fact  chat  the  adjoint  of 
?.  is  a  linear  operacor  P*  given  by  (see{31) 

■l  „  (d"xdv,)P2  (dy  xdy,) 

ri  .  ,  1*2  yiyi  1 

P.  .  Cy  )  .  .  .(„>/  -■■■■, - f-= - 

y;  ^  *;  My,) 

*  •  •  *>  •  -  i  *■ 


»  g*(y:)E'[g*(y,jEi[-,(y1)  iy„j  jy,  I  . 

Furthermore.  condition  (i!  can  be  rewritten  as 


(16) 


a  *  :  -  - 

•  :  . 

.  n-  n*  a  .  3  3  ■>  .  i 

i:  :i  i  i  :i  12'  i  1  - 


ii:> 


■  r.n ra 
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K. >((',)  -  g*(y, )S“(g"(y,)E*[v(y  )  y  ] ! y  ]  .  CIS) 

•  *■  -  L  •»  A. 


«'e  now  summarize  these  results  in  the  following 
proposition: 

?roposltion  1:  Under  T.'rd-i ;ic rts  '  1  and  2,‘ ,  the 

lecision  problem  with  multiple  probability  measures 
admits  a  Scackelberg  equilibrium  soluclon  if,  and  only 
if,  A  la  nonnegacive  definite  and  '13)  admics  a  soluclon 

in  ‘ .  .  - 

Equation  (15)  will,  in  general,  not  admit  a  closed- 
form  solution,  even  if  all  random  variables  are  jointly 
laussian  distributed  (see  Section  i) ;  therefore,  we 
will  have  to  resort  to  numerical  computations  which 
will  involve  a  recursion  of  some  type.  Hence,  in 
analyzing  the  conditions  of  existence  of  a  solution  to 
(13)  we  may  also  require  that  such  a  numerical  scheme 
be  globally  convergent  (or  stable,  in  the  3ense  defined 
in  (!]).  One  appealing  scheme  whereby  a  unique  solution 
to  ,15)  (or,  equivalently,  (11)]  can  be  obtained  is  the 
recursion 

'  lt*(y,;  *  (j''*  *^)(y, )  -  ,  k»i,2,...  (19) 

where  ,01  is  chosen  as  an  arbitrary  element  of  7^.  If 
tne  limit  lim  •  3  exists  in  for  all  3uch 

cnitial  :hoices,  then  *  will  necessarilv  constitute 
c  solution  to  13).  A  sufficient  condition  for  this 
<  tne  following: 

Preposition  1:  In  addition  to  the  conditions  of  Prop. 

1.  issume  tnat  tnere  exi3C3  a  scalar  O'-  l,  such 


where  r(z)  Is  che  spectral  radius  of  i.  Then,  the 
decision  problem  admits  a  unique  Scackelberg  equilib¬ 
rium  solution  (yVTpI'*)),  where  /*6r ^  is  the  limit  of 
the  Iterative  scheme  (19),  and  T->  is  the  affine 
operacor  (6).  *  c 


We  now  elaborate  on  (20),  so  as  to  bring  it  to  a 
form  which  separates  out  the  contributions  from  the 
deterministic  and  probabilistic  components  of  che 
problem.  Towards  this  end,  let  us  first  note  that 
using  (18)  in  (11a): 


r(«)  •  -  D^D^D^fO 


(21) 


and  utilizing  the  Inequality  relationship  beewean 
spectral  radiua  and  norm  of  an  operator  (see  [3], 
can  be  bounded  from  above  by 


'<^2D21P 


111 


2  1  -* 

+  °2i312Pl  1 
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D22321 


K>>, 


the 
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where  <<  ■  »,  is  the  operator  norm  as  defined  in 
[l,(17a)|.  Using  the  standard  (triangle  inequality; 
property  of  norma,  this  can  further  be  bounded  from 
above  by 


-  <312D21P1 : 1  ”  °21°I2?r  1^1  *  ■ 

■'*12 

Now  since  both  and  K  map  a  Hilbert  space  (7s) 
into  itself,  using  the'noro  inequality  for  products  or 
linear  operators,  we  further  have 


-  :°12°21Pi:i  *  °21Dl2Pll 


1  1 


no: 


,*.*■* 

i:. a:,?. 


rrt; 


where  che  eouaiicv  follows  because  1  i )  c.-.e  soectrai 
radius  and  norm  of  a  seif-adjoinc  linear  operacor  are 
equal  (5,p.  514],  (ii)  norm  jf  a  "non-self-ad  joinc'1 
linear  operator  X  is  eoual  to  tne  sauara  cooc  :ae 
soectr3l  radius  jf  cr-.e  seif-sa joint  joeracor  a“K  see 
[3]).  Finally,  the  latter  is  bounded  from  aoove  bv  \l\ 


1  S*l*  1 /S  1  / ' 

r(2)  :2«0i2D21D21O12,i  /_  !r(PM?l,l' 


■>  1  *> 


*  r(D;,02,3;,)  fr(K*X) ] ' 
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Mow,  lec  us  assume  the  following: 


Ssndii’-~n  S'.  There  exist  four  positive  scalars  ... 
C2.c3»-'4,  satisfying 


2  -V2  *  J3‘  4  1 

such  chat 

.*>•>*  i*  *>  *<*  * 

rf012D21D21312)i“i  •  r'D213223:i>-J3 

r(PllPl!l>-°2  ’  r(K*K)--'I 


(23) 

(24a; 
1 24b ) 


Then,  we  have 

Theorem  1:  Under  .'-ncirfons  -  2.'  of  32  and  ?snz 
S)  given  above,  che  decision  problem  admits  a  unioue 
Stackelberg  equilibrium  soluclon  (  3 ,Ts (  -3 ] ) ,  where 
\-a€T.  is  the  limit  of  the  iterative  scheme  (19),  and  T, 
is  given  bv  (6). 

Tvcof:  The  result  follows  from  Prop.  2  and  che  biacus- 
sion  and  derivation  that  leads  to  .Vnittf.-n  C  . 
provided  we  show  thee  che  given  three  conoitions  subsume 
1.17).  i.e.  nonnegativity  of  operator  A.  Ve  now  verifv 
that  'stdi-.sn  in  fact  implies  that  A  is  a  strongly 
positive  operator.  First  note  thac  A  is  seif-aojoinc . 
because K comnutes  wich  3^, .  Hence,  we  can  write 
down  the  inequalitv 

nA-I)  _  -  r'D;.D;,3:.  <K*K*'.) 


3 


♦  tioli2Dhh  i +  °£o&\  i>  • 

Then,  using  che  line  of  arguments  that  led  to  (22)  from 
21),  and  the  speotral  radius  inequality  for  the  product 
of  t vo  self-adjoint  operators,  we  obcain  the  bound 


1/2 


r(A-I)  <  (=•  r(D*,Di,D^)  r(K+«*) 

+  r[(D12D21°21D12)ll/:  [r(fl;l?l|l)! 

:  7  -'3  r(K+K*)  +  or,  . 

But  note  thac 

riK-rK*)  *  sup  [<  v ,  (K+K*)>>,.'  <v,  v»  J  •  2  sup  [<y,Ky>j/ 


•  «v.v>  ) 


ano  since,  from  the  Cauchy-Schwarz  inequality  of  inner 
products , 

.K-.^  "  2  !<v,y>  '  ;<K-,,Ky>3,  . 

we  have 

r(K-HC*)  '  2  sup  [<Kv  ,Ky>, /'-.,>>.  1 2' "  -  2  sup  ['y,K*K-, 
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Thus . 


A- 1) 


(25) 


•j-.  -i-z 

inol--ing  that  the  soectrum  of  the  seif-adjoinc  ooeratoc 
A-I  ;s  uniformly  in  the  unit  spnere.  Hence,  A  is  strong¬ 
ly  oosittve.  - 

special  class  oc  problems  of  interest  are  in* 
strictly  convex  team  orooiems  for  wnicn 

l  1  1  *»  :  -  1’*  - 

21,-1.  3^, -01,  ,  ?“*F7,  Ft*f7  and  *  l- 

For  such  problems  eq.  (15)  simplifies  to 

ny^  -  •:E1(E:[v(y1)  +  g1(y1)E-[g:(y:) 

•  E* [ v (y , )  y,  ]  -  E'tvfy^)  •  y2 J  '•!  ]  } 

-  F'E'tx.yJ  v  D2,?  7i1(yL)E2[g'(y;;)  ■.  E  2  [  x !  y  ,  | 

-  2"[x  y,  3  y i  +  O^F^E^tE'tx:  y,] ;  yLI  , 

and  in  .'.-halt ton  'S  inequalities  (24a)  are  replaced 
by  che  single  inequality 

[r(0123123[23&I1/2  *  "<D12°12>>1  •  ^ 

-where  can  be  caicen  to  be  less  cnan  one.  Hence,  (23) 
reads 

C 2-,-oJ  -  I/o  .  (26) 

Ve  now  summarize  these  results  as  a  corollarv  to 
Theorem  l. 

Corollary  1:  Under  Ssrdi--  :r.c  ’..-ill  of  §2,  and  (26) 
given  above,  che  strictly  convex  quadratic  team  problem, 
-with  multiple  probability  measures  and  asymmetric 
mode  of  decision  malting,  admits  a  unique  Scackelberq 
equilibrium  solution  ( -  3 ,  T '  [  ■*]),  -where  is  the 

limit  of  che  iterative  scneme  (19)  -with 


-  g1(y1)E2[g‘(y, )£*[-■  (y1)  i y2 J ; y t !  1 . 
and  T,  is  given  by  (6). 

Remark  I:  When  the  original  problem  is  a  Scacltelberg 
game,  buc  che  probability  measures  are  identical,  a 
study  of  che  original  condition  (20)  reveals  che 
inequality 

r(s)  -  r(D12D21  +  D;lD12  *  °2iD22D21) ^1 ' 1} 

1  ->  •»*  i*  2*  1  2 

-  r(012D21  +  D21D12  '  321322321) 

This  is  che  existence  condition  for  che  standard 
stochastic  Scackelberg  game  to  admit  a  unique  solution, 
which  corroborates  che  earlier  result  obcained  in  [ij. 

4.  Jointly  Gaussian  Distributions 

In  decision  and  control  theory,  one  appealing 
class  of  probability  distributions  has  been  the 
Gaussian  distribution,  because  it  leads  to  closed-form 
solutions  in  most  cases.  Indeed,  even  for  the  class 
of  nonstandard  multi-criteria  stochastic  decision 
problems  with  multiple  probabilistic  models,  we  have 
observed  in  [1]  and  [2]  thac  when  both  subjective 
probabilities  are  Gaussian,  che  unique  stable  Nash 
equilibrium  solution  can  be  obtained  in  closed  form 
and  is  affine  in  the  observations.  The  question  now 
is  whether  this  appealing  feacure  also  extends  to  Che 
model  created  in  this  paper,  where  che  solution  concept 
is  Scackelberg  instead  of  Nash.  The  main  conclusion 
of  this  section  is  that  -when  the  suoiectiv*  Gaussian 
orooacii icv  qistributions  ore  titrerenc.  there  .a  _n 
general  no  counterpart  of  che  results  of  tl.2!  in  the 
present  toncexc;  thac  -a  the  aniaue  equilibrium  solu¬ 
tion  wili,  in  general,  noc  be  affine.  However,  for  3om* 
soeciai  tases  which  will  be  delineated  in  the  seauel. 
the  minue  soiucion  .-ill  sell!  oe  affine. 

Towards  this  end,  let  as  adopt  tne  moaei  and 
notation  of  [1,  Section  4],  and  assume  validity  of  [1, 
(29a)].  Furthermore,  to  simplify  the  iiscussion  to 
follow,  let  us  take  che  mean  values  of  che  Gaussian 
distributions  to  be  zero.  Hence,  let 


mean  (y^.y,) 


J.G)  under  both  ?*  and  ?* 


:v2\ 


covariance  (y^.y,)  •  Iy  •  -  ^ 


1  -0. 


under  ?  .  a*  1.2,  and  assume 


-l 


l.J-1.2;  j-i 


IT) 


>  28' 


129) 


’  1 


Then,  che  decision  problem  will  admit  a  unique  linear 
solution,  if  and  only  if,  equaclon  (15)  is  satisfied  bv 
che  decision  rule 


(y^  -  Ay, 


,  20) 


for  some  linear  bounded  operator  A:  R  Hence, 

using  (15),  A  should  be  the  solution  of  ibv  pulling  A 
out  of  che  conditional  expectations) 

Ay L  »  3,l,3;iAE1:E2Iyi;y,:  y,  ] 

i*  :  ’  >  ; 

-  37,3:,Ag  ( y , ) e  ;s  i y ,)e  >.  v,; 


i-xy,)  •  *  p  .)  ■<•>.) 

k  —  *.«.  i.  k  i  i.  a. 


‘  3213220:iA,*("l'E*:**CyI,E*':''l  '  2 ' 
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"  f [x  yj  7.  0;iF;i1ly1)E‘[g'Cy;)E1[x  y,)  y^ 

-  3'1oi-,F*gliy1)E*[g'iy,)E*[x.y,]  1  y ,  I 

i  '  1  ■»  ai 

-  0^ ,F“EX [E* [x;  v,  1 ’ y^  J  .  Vy^S 


Since  che  random  variables  are  joinciy  Gaussian  under 
both  measures. 

£i[:YyJ  ■  Skiyi  ’  :  (31b) 

E1[x.v;]  -  S^v,  .  i, 1-1,2  .  (31c) 

for  some  matrices  S.V  and  S*.  In  view  of  this,  Ola) 
can  be  rewritten  as**' 


Ayi  •  (Di:D:iASi2sn  +  Fisoi  *  DoF?o:s2i)yi 


2  i  i 
+  (D2i3i2ASi2 


2  l  ■>  , 

32i3:20:iASi: 


7*  7  7  7  1  7  7 

G, ) g  (y^)E  Lg  ,  y L ] 


(32) 


Ibis  then  leads  to  the  following  Proposition: 

Proposition  3:  Let  (29)  and  rendition  '5.  be  satisfied, 

r  7  ,7 

and  either  ?  •  P*  or  P  <  P‘  .  Than,  the  quadratic 

yl  y'  y2  y2 

Gaussian  decision  problem  wlch  asymmetric  mode  of  deci¬ 
sion  making  ns  formulated  in  this  section)  admits  a 
linear  (Stackelberg)  equilibrium  solution,  if  and  only 

n 

( i)  chere  exists  a  bounded  linear  operator  A: 

S'-  .  iacisr-ing 


A  *  -'L2::iAS12s21  ‘  Voi  *  312r2302S21 


mis  joiucion  uso  satis:  ies 


'  33a/ 


3 


.AS 


7  "*  ,  7  7 

321322321AS12 


3213 


1  ,22 

22  2  32 


0 


(33b) 


See  [3|. 


Remark  2:  A  sufficient  condition  for  (33a)  to  admit  a 
unique  solution  in  the  Sanach  space  of  linear  bounded 
ooeracors  mapping  x  into  uj,  is 


T  7  7‘  1  7  1  1  ”  7  " 

rrDL232i:)21Di2)  Tr  'S12321S21312; 


which  is  clearly  satisfied  under  iendirier 


The  conditions  of  Prop.  3  are  clearly  non-void; 
because,  given  the  unique  solution  of  (33a),  it  mav  be 

17,  7 

possible  to  cind  F,,  F",  and  30  chat  (33b)  is 

satisfied.  However,  it  should  also  be  clear  that 
satisfaction  of  (33b)  places  3ome  severe  restrictions 
on  che  parameters  of  the  problem,  which  in  general  will 
not  be  mec.  Hence,  it  is  fair  :o  say  that,  if  either 

t  7  2  ’ 

?„  *P.,  oc  ?  »P  ,  generically  the  problem  does  not 
,  7 '  y , 

admit  a  linear  equilibrium  solucton,  even  if  ic  is  a 
team  problem;  that  is: 

.  .  12  i  i 

Corollarv  -:  either  ?  •?  or  ?  •?  ior  both). 

-  y,  v, 

the  quadratic  Russian  decision  problem  does  not  admic 
: genericallv )  a  linear  (Stackelberg;  equilibrium 
solution.  The  unique  solution,  vnich  exists  under  (29) 
and  .Vnc:  •),  is  nonlinear .  : 


The  conditions  of  che  preceding  Corollarv  Involve 
only  che  marginal  distributions  of  and  y ^ ;  in  the 
compliment  of  these  conditions  we  can  derive  che 
following  linear  solution: 

Proposition  2:  For  the  quadratic  Gaussian  decision 
i  2  12 

problem,  let  boch  ?  -P  and  P  *P  ibut  noc  necessar- 


ilv  P‘ 


,  and  even  P  ■?"  ) . 

yly2  yly2 


Then,  if 


2trCD12Q21D21D12),1/2  +  [r(D21D22D21,|1/2  '  1  0i> 

che  problem  admits  a  unique  Scackelberg  equilibrium 
solution  for  DM1  (the  leader)  which  Is  linear  in  y, : 


'i(V  *  Ayi 


r 

(35a) 


where  A:  R  “-U  is  the  unique  bounded  linear  operacor 
solving 

17  21  7*  ,*  1  7 

Ayl  ‘  (D12321AS12S21  ^°:i312AS12S21 


-  O^oi^AS^  *  F‘5^ 


12  2  1 
312?2S02S21 


(35b) 


*  °21F2S02S21  “  37i  D77'7Srl?:i7,)F, 


21  32' 2  02  21 


and  S,1,  are  defined  bv  (30a)-(30c),  and  S^.  is  defined 
ki  01 


by  El[x!y,|  -  SAty 
rroof.  When  P1  *P‘ 


1  ’  1  2 
and  ?  -P“  ,  g  (v,)»g  (v.,)*l  and 

?1  ?!  y2  y2  1  3 

hence  Tpna'iqiyns  1  and  2  of  Theorem  1  are  alwavs 
satisfied  (see  ind  in  .'one;: >1 .  "Then. 

■ )-i  is  the  counterpart  of  22) .  inu  lence'-sxis  cants  mo 
uniqueness  follow  from  Theorem  1.  .inearity,  on  cne 
ocher  hand,  follows  ov  oofing  thqt  if  we  start  ituration 
,19)  with  vl°'eO,  since  g* (v,_ ) »g" <  yo )»1  everv  term  will 
be  linear  in  ■•••_  'see  llso  22)).  ina  nence  the  '.i.mit 
wmen  exists  ov  “heoren  .)  uiil  :e  inear.  Ten.  :u.o- 
stitucing  v-iy^'-Ay-  in  13).  ue  ootain  35b'.  ov  simoi" 
letting  gMy,  )-g’(v,;»l  in  (32).  - 

Remark  3:  For  che  special  case  of  strictly  convex  :eam 
problems  (cf.  Corollarv  1),  (34)  is  replaced  bv 

1  1* 

3<°123L2'\ 
and  (35b)  simplifies  to  (cf.  (25)) 
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36b ) 


When  there  is  a  iiscreoancy  between  the  JMs*  perceo- 
cions  of  the  variances  of  either  y^  or  yj.  Prop.  «*  will 
noc  hold,  and  che  problem  will  admic  (generically)  a 
nonlinear  equilibrium  solution,  as  proven  eatlier  in 
Prop.  3  and  Corollary  3.  In  chis  case,  an  explicit 
closed-form  solution  cannot  be  obtained;  however,  an 
approximate  solution  can  be  derived  by  using  cne  itera¬ 
tion  (19)  which,  for  che  Gaussian  problem,  becomes 

* 

v  (>V  *  OpD^E fE“[V  Ny/^yjiv,  >  3;,D,  ,;*■  v,  > 
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If  w«  scare  chis  iteration  wich  yv  (y,)“0,  or  any 
Linear  function  of  ac  avary  iceracton  we  obcain 
linear  combinations  of  terms  of  the  type  A^'yi  and 

3(k)v,  exp  •  -  \  v'V(k)v.;.  where  A(k)  and  S(k  are 

Linear  operators,  and  V  >0  is  an  m.xm,  aacrlx.  Since 
this  is  a  successive  approximation  technique  under 
.Vhitr ten  S,  even  stopping  the  iteration  after  a  finite 
number  of  terms  will  provide  a  solution  sufficiently 
close  to  cha  unique  opdmuo.  Henca,  generically,  a 
subopcimal  policy  for  DM1,  which  is  sufficiently  close 
to  the  unique  solution  of  (15).  will  ba  of  the  form 


7 1  *xp  (  -  J 


where  N  is  a  sufficiently  large  Integer  (related  to  the 
number  of  iterations  caken  in  (37)),  and  A'  ,  B'  , 
v''are  generated  via  the  iteration  (17).  Note  chat  as 
N-“  this  solution  will  uniformly  converge  co  the  unique 
opcimun. 


Yec  anocher  subopcimal  soiuclon  can  be  obtained  by 
restricting  DMl's  policies,  at  the  outset,  to  linear 
functions  of  y^,  i.e.  co  che  form  (30)  where  A  is  a 
variable  linear  opecacor.  DM2's  response  to  any  such 
policy  will  also  be  linear  (in  y-,),  thus  making  T2  in 
(6)  a  linear  operator,  then,  the  problem  faced  by  DMl 
Is  minimisation  of  (7)  with  vfyj^Ay^,  over  all  linear 
bounded  operators  A.  the  solution  of  this  minimization 
problem  will  provide  DMl  with  a  linear  policy  that  is 
(in  general)  inferior  co  chq  limiting  soiuclon  of  (37), 
unless,  of  course,  g2(yi)”g*(y2)*l  in  which  case  the  two 
solutions  will  be  the  same  (satisfying  (36b)).  We  do 
not  pursue  here  the  details  of  the  derivation  of  chebesc 
linear  solution  for  che  general  case  (as  oucllned  above). 


Furthermore,  it  is  possible  co  work  ouc  the  various 
conditions  oocained  for  the  special  cases  of  finite 
ulmenslonal  problems  (especially  the  scalar  team  prob¬ 
lem;  and  concinuous-cime  problems,  and  write  down  che 
equilibrium  solution  explicitly  wnenever  it  is  linear. 
Sucn  an  analysis  would  roucinely  follow  the  lines  of 
me  discussion  of  [1.  Section  4],  and  hence  it  will  not 
be  included  here  mainly  because  of  space  limitations. 


3.  Conclusions 

This  paper  has  presented  an  equilibrium  theory  for 
two-person  quadratic  decision  problems  with  static 
Information  patterns,  wherein  che  decision  makers  (DKs) 
do  hoc  necessarily  have  che  same  perception  of  the 
underlying  probability  space,  and  there  is  a  hierarchy 
in  decision  making.  As  indicated  earlier  in  Section  1. 
vnen  such  discrepancies  exist,  even  team  problems  have 
to  be  analyzed  in  che  framework  of  nonzero-sum  stochas¬ 
tic  games,  and  because  of  the  presence  of  hierarchy  che 
Scackelberg  solution  becomes  the  most  meaningful  equllb- 
rlum  concept  for  such  decision  problems. 

Section  3  of  che  paper  has  provided  a  sec  of 
sufficient  conditions  for  existence  and  uniqueness  of 
an  equilibrium  soiuclon  for  a  general  decision  problem 
defined  on  Hilbert  decision  spaces  and  wich  arbitrary 
multiple  probabilistic  description.  These  conditions 
also  ensure  that  the  solution  (more  precisely,  che 
equilibrium  policy  of  che  leader)  can  be  obtained  as 
the  limit  of  an  infinite  sequence  which  Involves 
conditional  expectations  under  two  different  probabil¬ 
ity  measures.  This  sequence  is  structurally  different 
from  its  counterpart  in  che  case  when  che  mode  of  deci¬ 
sion  making  Is  symmetric  [11,  even  for  team  problems, 
and  it  contains  Radon-Nlkodym  derivatives  of  che  two 
probability  measures  as  multiplying  factors  (which 
were  absent  in  the  end  result  of  [11). 

This  different  structure  has  led.  in  Section  ...  co 
a  seemingly  surprising  (unexpected  result  for  che 
soecial  case  of  Gaussian  distributions  -  che  unique 


equilibrium  solution  being  generically  nonlinear  in  cne 
measurements.  This  constitutes  che  first  nonlinear 
solution  reported  in  che  literature  for  a  quadratic 
Gaussian  scacic  game  or  team  problem.  Ic  should  be 
noted  that  we  have  not  given  a  closed-form  expression 
for  this  nonlinear  solution,  but  have  provided  a 
recursive  scheme  which  generates  admissible  policies 
chat  come  arbitrarily  close  co  che  optimum  solution. 
Furthermore,  under  some  assumptions  on  the  relative 
structures  of  che  probability  measures  perceived  by  che 
DMs,  we  have  shown  chat  che  unique  solution  is  linear 
in  che  scacic  measurements. 

Possible  extensions  of  this  study  could  be  carried 
ouc  along  che  lines  discussed  in  some  detail  in  Section 
5  of  [1].  Particularly,  one  of  che  Issues  chat  requires 
ismediacs  attention  is  an  analysis  of  che  existence 
conditions  of  chls  paper,  and  che  structure  of  che 
equilibrium  solution,  when  che  discrepancies  between  che 
perceived  probability  measures  are  sufficiently  small, 
such  as  the  probability  measures  being  within  an 
c -neighborhood  of  a  consnon  nominal  one.  Indeed,  such 
an  analysis  and  che  ensuing  results  will  provide  che 
right  framework  for  a  further  analysis  chat  Involves  an 
Investigation  of  sensitivity  and  robustness  properties 
of  ceam  solutions  (obtained  under  che  common  probability 
measure  alluded  co  above)  in  chls  e-neighborhood.  These 
questions  are  currently  under  study,  and  results  along 
these  lines  will  be  reported  in  che  future. 
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Abstract 


This  paper  develops  an  equilibrium  theory  for  two-person  two-criteria 
stochastic  decision  problems  with  static  information  patterns,  wherein  the 
decision  makers  (DM's)  have  different  probabilistic  models  of  the  underlying 
process,  the  objective  functionals  are  quadratic  and  the  decision  spaces  are 
general  inner-product  spaces.  Under  two  different  modes  of  decision  making 
(viz.  symmetric  and  asymmetric),  sufficient  conditions  are  obtained  for  the 
existence  and  uniqueness  of  equilibrium  solutions  (stable  in  the  former  case) ,  and  in 
each  case  a  uniformly  convergent  iterative  scheme  is  developed  wherebv  the  equilibrium 
policies  of  the  DM's  can  be  obtained  by  evaluating  a  number  of  conditional 
expectations.  When  the  probability  measures  are  Gaussian,  the  equilibrium  solution 
is  linear  under  the  symmetric  mode  of  decision  making,  whereas  it  is  generically 
nonlinear  in  the  asymmetric  case,  with  the  linear  structure  prevailing  only  in 
some  special  cases  which  are  delineated  in  the  paper . 
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1 „  Introduction 

A  team  is  defined  as  a  group  of  agents  who  work  together  in  a 
coordinated  effort,  in  a  possibly  hostile  and  uncertain  environment,  in  order 
to  achieve  a  common  goal.  In  achieving  this  goal,  the  members  of  the  team 
do  not  necessarily  acquire  the  same  information,  and  hence  they  have  to  operate 
in  a  decentralized  mode  of  decision  making.  The  scientific  approach  to 
formulation  and  analysis  of  team  problems  has  involved  (i)  a  quantification  of 
the  underlying  common  goal  in  the  form  of  a  (mathematical)  objective  function 
which  is  sought  to  be  optimized  jointly  by  the  agents,  and  (ii)  a  modeling 
of  the  uncertain  environment  and  the  possible  measurements  made  by  the  agents 
on  this  environment  in  the  form  of  a  probability  space  together  with  an 
appropriate  information  structure  [14,7,15,16].  The  underlying  stipulation  here 
has  been  the  existence  of  a  probability  space  that  is  common  to  all  the  agents, 
so  chat  through  their  priors  all  members  of  the  team  "see  the  world"  in  exactly 
the  same  way . 

One  question  that  readily  comes  into  mind  at  this  point  is  the 
robustness  of  such  a  mathematical  model,  and  the  "optimum"  solutions  it  produces, 
to  slight  variations  in  the  underlying  assumptions.  In  particular,  what  if  the 
agents  perceive  the  outside  world  in  slightly  different  ways?  Would  the 
solution  obtained  under  the  assumption  of  common  prior  probability  measures 
change  drastically  if  there  are  discrepancies  in  the  agents'  perceptions  of  the 
probabilistic  description  of  the  outside  world?  In  order  to  be  able  to  answer 
these  queries  satisfactorily  and  effectively,  we  need  a  theory  of  equilibrium 
for  decision  problems  in  which  the  decision  makers  (DM's)  have  different 
probabilistic  models  of  the  system;  such  a  general  theory  will  clearly  subsume 
the  currently  available  results  on  teams  which  use  a  common  probabilitv 


space . 
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Consider  a  static  team  decision  problem,  formulated  in  the  standard 
manner  as  in  [7],  with  the  only  difference  being  in  the  underlying  probability 
space.  In  particular,  assume  that  the  DM's  assign  different  subjective 
probabilities  to  the  uncertain  events,  in  which  case  there  will  not  exist  a 
common  probability  space,  thereby  leading  to  a  different  expected  (average) 
cost  function  for  each  DM.  Hence,  once  we  relax  the  assumption  of  existence 
of  a  common  probability  space,  the  team  problem  is  no  longer  a  stochastic 
optimization  problem  with  a  single  objective  functional,  and  we  inevitably 
have  to  treat  it  as  a  nonzero-sum  stochastic  game  [5,8,12].  Furthermore,  even 
though  the  original  team  decision  problem  with  a  common  probability  space 
will  admit  the  same  team-optimal  solution(s)  regardless  of  the  mode  of 
decision  making  (that  is,  regardless  of  whether  the  roles  of  the  DM’s  are 
symmetric  or  whether  there  is  a  hierarchy  and  dominance  in  decision  making) , 
this  feature  ceases  to  hold  true  when  there  exists  a  discrepancy  between  the 
perceived  probability  measures.  When  there  are  only  two  members,  for  example, 
two  possibilities  emerge  in  the  presence  of  discrepancies:  the  totally 
symmetric  roles,  corresponding  to  the  Nash  equilibrium  solution,  and  the 
hierarchical  mode, corresponding  to  the  Stackelberg  equilibrium  solution. 

Motivated  by  these  considerations,  we  treat  in  this  paper  a  more 
general  (than  team)  class  of  two-person  stochastic  decision  problems  which 
can  be  viewed  as  static  stochastic  nonzero-sum  games  with  the  DM's  having 
different  subjective  probability  measures.  Adopting  both  the  symmetric  and 
asymmetric  modes  of  decision  making,  we  develop  in  each  case  a  general 
theory  of  equilibrium  when  the  objective  functionals  are  quadratic  and  the 
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decision  spaces  are  appropriate  Hilbert  spaces.  Such  a  formulation  includes  both 
finite-dimensional  (discrete)  and  continuous-time  decision  problems,  and  involves 
arbitrary  probability  measures  which  are,  though,  restricted  2  posteriori  by  the 
conditions  of  existence  and  uniqueness  developed  in  the  paper.  The  special  case  of 
Gaussian  distributions  is  studied  in  considerable  depth,  and  some  explicit  solutions 
are  obtained  with  appealing  features. 

The  organization  of  the  paper  is  as  follows.  The  next  section  (§2) 
provides  a  precise  problem  formulation,  and  introduces  the  two  solution  concepts 
adopted  in  the  paper.  Section  3  develops  general  conditions  for  existence  and 
uniqueness  of  a  stable  equilibrium  solution  under  the  symmetric  mode  of  decision 
making,  and  elucidates  the  extent  of  the  restrictions  imposed  on  the  problem  by 
these  conditions.  Section  4  presents  a  counterpart  of  the  results  of  Section  3 
under  the  asymmetric  mode  of  decision  making,  with  the  mathematical  machinery 
used  being  inherently  different  from  that  of  §3.  Section  5  deals  with  the  special 
class  of  Gaussian  distributions,  under  both  symmetric  and  asymmetric  modes  of 
decision  making.  In  the  former  case  it  is  shown  that  the  unique  stable  equilibrium 
solution  is  affine  in  the  measurements  and  can  be  obtained  explicitly.  In  the 
latter  case,  however,  the  solution  is  generically  nonlinear,  and  contains  summation 
of  terms  which  involve  products  of  linear  functions  of  measurements  with  exponential 
terms  (whose  exponents  are  quadratic  in  the  measurements) .  The  section  also  contains 
some  discussion  on  finite-dimensional  and  continuous-time  problems,  treated  as 
special  cases.  Section  6  is  devoted  to  discussions  on  possible  extensions  of 
these  results  in  different  directions,  provides  some  interpretation  of  the 
general  approach  and  results,  and  includes  some  concluding  remarks.  The  paper  ends 
with  five  Appendices  which  include  results  used  in  the  main  body  of  the  paper. 
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2.  Mathematical  Formulation  and  Some  Basic  Results 


zu  Spaces 


nu  m.  . 

Let  il  a  Bn  x  E  x  ]R  =  X  x  x  B  denote  the  Borel  field 

k  k 

of  subsets  of  P.,  and  B  denote  the  Borel  field  of  subsets  of  E  ,  k  *  n,  m^,  m^ . 

Let  P  denote  the  set  of  all  probability  measures  on  (s2,B)  with  finite  second 

moments,  and  for  each  P€  p  denote  the  corresponding  marginal  measures  on 
n  ml  m2 

B  ,  B  and  B  by  P  ,  P  and  P  ,  respectively.  Furthermore,  let  the 

X  yl  y2 

collection  of  all  such  probability  measures  be  denoted  by  P  ,  P  and  P  , 

x  y:  y2 

respectively.  Then,  for  each  P€P,  the  vector  z  =  (x",  y',  y ')',  taking  values 

in  12,  becomes  a  well-defined  random  vector  on  (D,B,p),  and  likewise  x  is  a 

m .  m . 

random  vector  on  (]Rn,  Bn,P  )  and  y  is  a  random  vector  on  (]R  1  ,B  ,P  ). 

xi  y. 

Here,  x  denotes  the  unknown  state  of  Nature,  and  y^  denotes  an 

observation  of  DMi  (i'th  decision  maker) , which  is  correlated  with  x.  We  now 

1  7 

choose  two  elements  out  of  P,  P  and  P“,  which  denote  the  subjective  probabilities 

assigned  to  z  by  DMI  and  DM2,  respectively.  For  technical  reasons,  we  place 

1  2 

some  further  restrictions  on  the  choices  of  P  and  P  through  the  marginals 
P^  ;  in  particular  we  assume  that 

J  .  1  2  7 

Condition  ,’D.P  and  P  are  absolutelv  continuous  [1]  with  respect  to  P“ 

y2  yx  *  y2 

and  P^  ,  respectively;  that  is,  using  the  standard  notation  in  probability 
yl 

theory. 


<  <  P' 


<  <  P 


1 


■2  '2  3 1  yl 

Tpnavttcn  :Z).  The  Radon-Nikodym  (R-N)  derivative  [1] 


:1(r)  =  dp;  /■  d?;  • 

'  i  '  i 


(1) 


12) 


is  uniformly  bounded  a.e.  P1 


yi 


i-1,2. 


p 


The  necessity  of  these  two  conditions  in  tie  formulation  of  our  problem  will  be 

made  clear  in  the  sequel.  We  should  note,  however,  that  for  the  special  case 
1  2 

when  P  is  equivalent  to  P  ,  both  of  these  conditions  are  satisfied  (in  the  latter 
case  the  bound  is  equal  to  1)  and  we  have  the  standard  decision  theoretic  framework 
[2]  with  a  single  probability  space. 

2.2.  Decision  end  Policy  Spaces 

The  decision  variable  of  DMi  will  be  denoted  by  u^  which  belongs  to  a 
real  separable  Hilbert  space  IT  with  inner  product  (',,)^>  Permissible  policies 
(decision  rules)  for  DMi  are  measurable  mappings 


mi  o 

v.  :  E  -»  U.  ,  I*  I  v  .  (^)fl f  p1  (d?)  <  ® 
1  1  1  1  v. 


(3) 


where  II*  II  is  the  natural  norm  derived  from  (*,*)^.  Let  denote  the  space  of 
all  such  policies,  which  is  further  equipped  with  the  inner  product 


<  y,3  >.  =  /  (Y(C)  ,3(0)  P^  (dC) 
Yi  yi 


(4) 


Then,  we  have  the  following  two  results  the  first  of  which  is  standard  [3]  and 

the  second  one  involves  a  change  of  measures  using  the  R-N  derivative. 

Lemma  1.  T  is  a  Hilbert  space.  r 

Lemma  2.  If  Conditions  ( 1 )  and  (2)  are  satisfied,  every  element  of  T  has 

bounded  second-order  moments  also  under  P^  ,  j /i . 

yi 

2.3.  Cost  Functionals 

Let  Dj j :  U  -*■  IL  (i^j,  i,j  =  l,2)  be  strongly  positive  bounded  linear 

operators,  and  F^:  X  —  U.  be  bounded  linear  operators  for  all  i,j=l,2. 

1  J 

Furthermore,  let  [u1  (z)  j  y,.  ]  denote  the  mathematical  expectation  of  a 


u£U .  . 
J 


That  is,  there  exists  x  >  0  such  that  (u,D..u)  >  i(u,u).  for  all 

Jj  J  -  J 
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(5) 


z-measurable  random  variable  u^(z)  taking  values  in  IL  conditioned  on  the 
random  variable  y  ,  and  under  the  probability  measure  p1,  i.e. 

[ p (2)  ! y ^ ]  =  /u(z)p|v  (dzjy^) 
a  iyi  1 

where  the  second  term  of  the  integrand  is  the  conditional  probability  measure 
derived  from  P1.  Then,  for  each  pair  x  T ^ ,  we  ha' e  a  quadratic  expecte 

cost  functional  for  each  DM,  defined  for  DMi  by 


VvV  58  2  <Yi’Vi  +  2  /  (Yj(5)‘  DjjYj(5))j  Py.(d°  '  <Yi’  El[Fixlyi]>i 


Y. 

3 


(6) 


XxY . 

j 


/  (Yj(5),  Fix)jP1(dx,Y.,dC)  ~  <Yi,  E  [DijYj(yj)lyi]>i 


every  term  of  which  can  be  shown  to  be  finite,  in  view  of  Lemmas  1  and  2.  Note 
that  in  the  absence  of  Conditions  (1)  and  (2),  is  not  necessarily  finite  and 
hence  the  problem  is  not  well  defined. 

It  is  worth  mentioning  here  that  describes  a  most  general  type  of 
quadratic  cost  functional  which  is  strictly  convex  in  u^,  and  that  the  formulation 


here  covers  also  the  cases  of  team  problems 
^■Djj  =  °12  =  D21*  Fi  =  Fi’  i’j=1*2’  i^j)  and 


(fjj  -  -  I.  D1 


,2*  .1 


12  “  °21’  Fi  =  Fi’  i.j-1.2,  i^j ) .  But  even  in  these  "single 

loss-functional"  problems,  the  DM's  will  have  inherently  different  expected 
cost  functions  whenever  P1  and  P 2  are  different,  since  then  a  common  probability 
space  does  not  exist.  This  forces  us  to  formulate  the  problem  as  a  multi¬ 
criteria  optimization  problem  and  introduce  equilibrium  solution  concepts  that 
would  be  appropriate  in  this  framework. 


A  superscript  (*)  designates  the  adjoint  of  a  given  linear  operator 
defined  on  a  Hilbert  space,  and  I  designates  the  identitv  operator. 
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2.4.  Equilibrium  Solution  Under  zhe  Surmetrio  Mode  of  Decision  Maying 

Since  the  expected  cost  functionals  (6),  together  with  the  policy  spaces, 
provide  a  normal  (strategic)  form  description,  regardless  of  the  presence  of 
multiple  probability  measures,  the  standard  definition  of  noncooperative  (Nash) 
equilibrium  [5]  remains  intact,  which  is  the  most  reasonable  solution  concept 
here  under  the  symmetric  mode  of  decision  making. 

Definition  1.  A  pair  of  policies  (y° , y°)  6  x  T  constitutes  a  Nash  equilibrium 
solution  if 


o  o 


J1(y1,y2)  -  ’  J2(y1,y2>  -  j2(yi»V  ’  Vy1  6  :i*  y2  6  F 


2  ‘ 


Definition  2.  A  Nash  equilibrium  solution  (y°,y»)  is  stable  if  for  all 

(Y(0)  (0)  e  r 

' '  1  ,y2  '  * 1  ‘  2’ 


(7) 

c 


,  .  (k) 

lim  y . 

k-w»  1 


O 

Yi 


in  r. 

l 


i-1,2. 


(8) 


wnere 


(k)  T  /  (k-1) . 

=  arg  mm  ) 

F1 


(9a) 


v(k)  .  ,  (k-1)  . 

<2  =  arg  mm  ^(y^  ,y2 

‘  2 


k-1, 2. 


( 9b' 


Remark  1.  The  notion  of  stable  equilibrium  makes  particular  sense  (and  is  of 
paramount  importance)  in  decision  problems  wherein  the  DM’s  have  different  priors 
on  the  uncertain  quantities,  because  it  is  determined  as  the  outcome  of  a  natural 
iterative  process.  In  this  process,  each  DM  responds  optimally  (using  his  priors) 
to  the  most  recent  decision  (policy)  of  the  other  DM,  with  the  priors  on  which 
this  decision  is  based  being  irrelevant.  In  other  words,  even  though  the  computation 
of  the  Nash  equilibrium  solution  will  depend  on  the  different  prior  probability 
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measures  perceived  by  two  DM's,  in  Che  iterative  procedure  that  leads  to  this 
equilibrium  each  DM  has  to  know  only  his  own  prior  and  the  other  one's  announced 
policy  at  the  previous  step.  For  an  earlier  utilization  of  this  concept  in  a 
deterministic  setting  we  refer  the  reader  to  [28].  c 


2.5.  Equilibrium  Solution  Under  an  Asymmetric  Mode  of  Decision  Making 

In  the  case  of  the  asymmetric  mode  there  is  a  hierarchy  in  decision 

making,  which  permits  one  DM  (say  DM1 — leader)  to  announce  and  enforce  his  policy 

on  the  other  DM  ( follower ) .  The  relevant  solution  concept  here  is  the  leader- 

follower  (Stackelberg)  solution  which  is  introduced  below. 

s  s 

Definition  3.  A  pair  of  policies  (y^>y.,)  e  x  F^  constitutes  a  leader-follower 
(Stackelberg)  equilibrium  solution  with  unique  follower  responses,  if  there  exists 
a  unique  mapping  T^:  F^  ■*  F?  satisfying 

J2(y1,T9[711)  1  J2  ( '  1 ’ y2>  ’  V(vi'y2^  S  F1  x  :2  (10) 

and  furthermore 


J1(y1’VY1])  -  J1(y1’T2[y1])  *  Vy1  5  F1 

with 

s  _  r  s. 

■  2  =  ^  -,  [  y  ^  ] 


(11) 


Remark  2 .  The  uniqueness  condition  on  T,,  is  satisfied  in  our  case,  because  is 
strictly  convex  (and  quadratic)  in  .  ~ 

Remark  3 .  The  solution  introduced  above  may  not,  at  first  glance,  appear  to  be  an 
equilibrium  solution,  because  of  the  strict  ordering  of  the  DM's.  However,  it  can 
be  shown,  by  following  an  argument  first  developed  in  [17],  that  the  Stackelberg 
solution  can  be  viewed  as  the  so-called  "strong  equilibrium"  of  a  decision  problem 
with  a  modified  (dynamic)  information  pattern  [see  Appendix  E] .  - 
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3 .  General  Condicions  for  a  Stable  Equilibrium  Solution 
Under  the  Symmetric  Mode 

We  now  obtain  some  general  condicions  for  existence  of  stable  equilibrium 
solutions  under  the  symmetric  mode  of  decision  making ,  and  also  consider  some 
special  cases  when  the  probability  measures  of  both  DM's  are  absolutely  continuous 
with  respect  to  the  Lebesgue  measure  (i.e.  when  densities  exist).  Firstly  we  have 
Proposition  1.  A  pair  of  policies  (v°,v°)  6  x  constitutes  a  Nash  equilibrium 
solution  to  the  decision  problem  of  §2,  if,  and  only  if,  it  satisfies  the  pair  of 
equations  (under  the  notation  of  (5)): 


v°(yx) 


y°2(y2) 


=  d^2  E1[Y2(y2)i7i]  +  Fi  e1[x:V 
=  D21  E2[y°(yi)|y2]  +  F‘  Lx[ y2 ; 


(12a) 


(12b) 


rvcof.  This  result  follows  from  a  simple  minimization  of  the  two  quadratic  forms 
o  o 

Jj ( V L , Y ? )  and  J2(y^,y0)  on  the  two  Hilbert  spaces  7^  and  7,,  respectively,  and  by 
virtue  of  the  fact  that  these  two  quadratic  forms  are  positive  definite  in  the 
relevant  variables.  - 

3v  the  same  argument  used  in  the  proof  of  Proposition  1,  relations  (9a) 
and  (9b)  in  Def.  2  can  equivalently  be  written  as 


)  =  °12  Elfy2k  1)(F2)'yl^  +  F1  e1[xi-vi 

,<k>  „  n2  *2,  (k-1) 


?  1 


°2i  E  [yi  (y1)|y2]  +  F‘  E‘[x;y2)  ,  k=l,2,  . 


(13a) 

(13b) 


Now,  substituting  (13b)  into  (13a),  and  also  (13a)  into  (13b),  by  appropriately- 
matching  the  superscripts,  we  arrive  at  the  following  two  recursive  relations: 


yfk)(v.)  =Di.D2.  Ei[EJ[yfk_2)(v  )|y.  ]!>’]  +  fJ  Ei[x;v.] 
1  '1  lj  Jl  1  x  j  1  1  '1 


(1-) 


+  D^Fj  n1  [E^  [x;>Tj  ]  |v.  ]  ,  j  ,i=l  .2 ;  j^i;  k=2,i,...  or  k=3,5,. 


1 
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Note  Chat  if  Che  recursive  scheme  (14)  converges  for  even  values  of  k,  it  also 
converges  (to  the  same  limit)  for  odd  values  of  k  [this  follows  from  expressions 
(13a)-(13b) ] .  Hence,  we  confine  attention  only  to  even  values  of  k  and  obtain  the 
following  result  as  a  direct  consequence  of  the  foregoing  analysis: 

Proposition  2.  A  pair  of  policies  e  7^  x  constitutes  a  stable  Nash 

equilibrium  solution  if,  and  only  if,  for  all  (y^^.y^^)  €  x  *'2* 

Y° (yf)  =  lim  Y^“k\yt)  in  "i  ,  (15) 

k-*® 


where  k=l,2,...,  is  given  recursively  by  (14).  Furthermore,  such  a  stable 

equilibrium  solution  is  necessarily  unique.  - 

Let  us  now  introduce  linear  operators  5^:  7^  -  7^,  i=l,2,  by 


5.(v)  -  D^.  E  [E^IvCvpiyj 


j^i;  i,j=l,2. 


(16) 


Note  that  indeed  maps  into  7^,  because  the  conditional  expectation 

EJ  [D~!  .  Y  (y . )  |  y .  ]  maps  V  into  7.  (j^i)  when  the  probability  measures  satisfy 
J  ^  ^  J  i  3 


.  - - - 


'/  and  ',2j ,  and  every  element  of  7.  is  square- integrable  under  both  P 

1  v 


and  P^  (cf.  Lemma  2). 
'  i 


Furthermore,  let  us  introduce  the  notation  <<%>>.  to  denote  the  norm 

1  - 

of  a  linear  bounded  operator  7^  —  i\,  which  is  defined  by 


<<3>>.  =  sup  [<5y>Sy>.. /<Y.Y>  .  ] 
-f€ri 


1/2 


(17a) 


and  r^(5)  to  denote  the  spectral  radius  of  5,  which  is  defined  by  [see  Appendix  A] 


r.(5)  =  lim  sup  [ <<$k>> . ] 
1  k—  1 


(17b) 


where  5  denotes  the  k'th  power  of  £ .  Finally,  let  us  introduce  the  linear 


operators 
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D*  =  D1 

ij  ji 


(18a) 


Pt i ±  =  E1[Ejl-|yjJyi] 


(18b) 


both  of  which  map  T  into  itself  (the  former  also  maps  Ui  into  itself).  Then, 
the  following  Proposition,  whose  proof  depends  on  a  contraction  mapping 
argument  (see  Appendix  B) ,  provides  a  set  of  necessary  and  sufficient  conditions 
for  existence  of  the  unique  equilibrium  solution  alluded  to  in  Prop.  2. 


Theorem  1.  (i)  Under  Conditions  (1)  and  (2),  the  decision  problem  of  Section  2  admits 
a  unique  stable  Nash  equilibrium  solution  given  by  (15)  if,  and  only  if,  there  exists, 
for  at  least  one  i=l,2,  a  p1,  0<c1<l,  such  that 

r.03.)  =  r-CS1?.  ,4)  <  /  •  (19) 

11  1  1  |  a.  - 

(ii)  A  set  of  sufficient  conditions  for  (19)  to  hold  true  is  the  existence 


of  a  pair  of  positive  scalars  (p^,,^),  such  that 

i  i  ,  i  .  i 

OiP2<l  ,  ri(D  )  <  Cjl  .  r i  f  i ^  1  -'2 


(20a) 


Furthermore,  a  set  of  sufficient  conditions  for  the  latter  two  is 


«51>>  =  B DXil  .  <  0*  ,  «P.i.>>.  <  0* 

i  i-l’  111-2 


(20b) 


where  i'i  denotes  the  operator  norm  on  U^,  as  a  counterpart  of  (17a). 
Proof.  See  Appendix  B. 


Part  (ii)  of  Thm.  1  provides  a  partial  separation  (in  terms  of  sufficient 
conditions)  of  the  deterministic  and  stochastic  parts  of  the  system.  Now,  if 
the  decision  problem  is  a  team  problem  with  a  common  loss  functional  [which 
requires  ~  1,  =  =  F7  and  F*  =  F-,],  and  if  team  cost  is  strictly 
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convex  in  the  pair  (u-^.u,,)  [which  is  true  if  and  only  if  IId12D1',II1  =  e  <  11*  ic 

l  2 

follows  that  the  first  inequality  holds  with  <  1.  If,  furthermore,  the 

subjective  probability  measures  assigned  to  the  pair  (y^^)  by  the  two  DM’s  are 

equivalent,  becomes  the  product  of  two  projection  operators,  thus  leading 

1  2 

to  satisfaction  of  the  second  inequality  in  (20b)  with  =  ^  =  anc*  thereby  to 

satisfaction  of  (20a).  Hence,  as  a  corollary  to  the  second  part  of  Prop.  3,  we 

obtain  the  following  result  which  is  known  in  different  contexts  [7,8,9]. 

Corollary  1.  For  the  strictly  convex  quadratic  team  problem  with  equivalent  subjective 

probability  measures  assigned  by  the  two  DM's  to  there  exists  a  unique  stable 

equilibrium  solution  (the  so-called  team-optimal  solution),  irrespective  of  the 

underlying  common  probability  measure.  3 

1  2 

For  team  problems  with  P  *P  ,  a  result  along  the  lines  of  Corollary  1  does 


not  in  general  hold,  because  the  operator  is  not  necessarily  the  product  of  two 

projection  operators.  Then,  the  general  condition  is  (19)  [or  the  stronger  one,  (20a; 

which  places  some  restrictions  on  the  parameters  of  the  cost  functional,  as  well  as 

1  2 

the  probability  measures  P  and  P  .  To  delineate  the  extent  of  these  restrictions, 
we  now  study  the  second  inequality  of  (20b)  somewhat  further  and  obtain  the  following 
sufficient  condition. 

Corollary  2.  For  a  given  p^,  the  second  inequality  of  (20b)  is  satisfied  if  the 


expression 


gx(y  )Ej  [g^  (y  )  |y  ]  =  g1(yi)/  g^(n)P^  ■  (dnU  =  y  • ) 
j  v  i  y,-  1 


(21a) 


is  uniformly  bounded  from  above  by  (p9)“  a.e.  P3]  .  Furthermore,  if  the  probability 
12  i 

measures  P  and  P  are  absolutely  continuous  with  respect  to  the  Leb°sgue  measure, 
this  condition  can  be  expressed  equivalently  in  terms  of  the  probability  densities 
p1(yi,y^.)  as  follows: 

This  result  is  slightly  more  general  than  the  related  ones  that  can  be 
found  in  [7,3,9],  since  here  P^  is  allowed  to  be  different  from  P^,  though  still  a 
restriction  is  imposed  on  these  (indirectly)  via  the  equivalence  between  and 

h  *  V  .  V 

.  V  *  ^  - 

*  1  ’  - 


1 
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pj  (yi} 

~T -  /  tPv  (n)py  ,V  (n  (yi);Py  (n)ldn  -  (^2}^ 

py  (yi>  Yi  j  jl  1  j 

y  i  J 

Proof.  For  (21a)  see  Appendix  C;  (21b)  follows  readily  from  (21a). 


(21b) 


c 


4 .  General  Sufficient  Conditions  for  a  Stackelberg 
Equilibrium  Solution 

We  now  turn  our  attention  to  the  asymmetric  mode  of  decision  making, 
obtain  some  general  sufficient  conditions  for  existence  of  a  Stackelberg  equilibrium 
solution,  and  provide  a  complete  characterization  of  the  solution.  Subsequently  we 
consider  some  special  cases  with  some  further  structure  imposed  on  the  cost  function; 
and  the  probability  measures. 

Firstly  we  obtain  an  expression  for  DM2's  unique  reaction  T? :  -*■  T^,  as 

defined  by  (10),  using  Prop.  1: 

T2t>'1]  =  Y2  (y  2>  =  D21e2  ^  Y1  <-yi'>  +  F2e2  1 i  y  2  I  .  (22) 

Hence,  the  derivation  of  the  leader's  Stackelberg  policy  y^SF^  involves  (in  view  of 
(11))  the  minimization  of  over  after  y°  given  b.  (22)  is  substituted  in.  This 
substitution  yields 


J(y)  =  J  -^  ( Y  >  Y  2 )  =  \  <Y*Y>1  +  -J  j  (F^E2 [x  j  j 

Y9 


+  D21E2fY(y1)!y2],D22D21E2[Y(y1)!y2]  +  D22F2E2[x|y2])2  P^(dS) 


-  <■(  ,  E1[F2x|y1]>1  +  /  (D21E“[Y(y1)  |y?]  +  F“E“[x!y0],  F^  x)  2 


(23) 


XxY 


Pi(dx,Y1,dC) 


-  <y,  E1[Dj2D21E2[Y(y1)|y2]|y1]  +  E1  [D^E2  [x  '  y2  ]  j  ]  >  1 , 


where  we  have  deleted  the  subscript  1  in  y^  in  order  to  simplify  the  notation. 
Now,  since  F^  is  a  linear  space,  and  J  is  the  sum  of  terms  homogeneous  of  degree 
zero,  one  and  two  (maximum),  any  minimizing  solution  ;^F 


will  have  to  satisfv 
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AJ(y  ;  h)  *  J ( y+’n )  -  J(y)  =  5J(y  ;  h)  +  5  J(y  ;  h)  >  0  Vhgr  (24) 

where  6iJ(y  ;  h)  is  the  Gateaux  variation  of  J(y)  of  degree  i,  Extensive 

manipulations,  details  of  which  are  given  in  Appendix  D  (subsection  1),  lead  to 

2~ 

the  following  expressions  for  5J  and  5  J; 


6J(y  ;  h)  -  <h,y>  -  / 

rr 


(h(y1) ,  (2Y)(y1))1Py  (dy:) 


Y  (h(yL) >3(y1))1Py  (dyx> 


* 

62J(y  ;  h)  =  y  <h,h>1  +  j  f  (h(y1),g1(y1)E2[g2(y2)D21D22D21 

Y1 

•  E2[h(5)  t y 2 ]  I y 1 1 ) lPy  WVi>  -  <h'Dl12D21?l!lh>l 


(25) 


(26) 


where  1:  r^-*T^  and  are  defined  by 

A 

-  D21D22D21g1(y1)E2[g2(y2)E'[Y(yi)  |  ] 

3(yL>  =  [x | y^ ]  -  D21D22F2g1(y1)E2[g2(y2)E2[x;y2]  yj 

A 

-  Dj2F2E1[E2(x|y2]iy1]  +  D^g1  (y^E2  [g2  (y,  )E1  [x  i  y2  ]  !  y:  ; 

P^i  ^  a  H-near  operator  given  by 

pr  iy(yi>  = 


(27a) 


(27b) 


( 2  S  I 


Here  5^J  is  written  simply  as  iJ. 
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11^  is  the  space  of  y^-measurable  random  variables  taking  values  in  ,  and  g  (')  are 
the  R-N  derivatives  (2).  Note  that  ?1 i 1  is  related  to  ^ j 1  defined  by  (18b)  bv 


lilY<yi)I 


(pi!iv)<V 


where  the  latter  (which  is  a  mapping  from  into  F^)  has  been  used  in  (26)  and  will 
also  be  used  in  the  sequel  whenever  needed. 

Now,  since  (24)  is  also  equivalent  to 


oJ  (y  ,h)  =  0  ¥h€I\ 


d2J(y ,h)  >  0  VhSF, 


a  Stackelberg  solution  y€r^  will  exist  for  the  leader  if,  and  only  if, 
(i)  (26)  is  nonnegative  definite, 

and  (from  (25) )  : 


(29) 


(ii)  y(yx)  -  (svKy^  -  B(y;[)  =  0  ,  a.e. 


(30) 


Since  the  first  of  these  conditions  does  not  depend  on  v,  the  optimal  solution 
is  solely  determined  by  (30),  which  can  be  rewritten  as 


YdV  =  Di2D21E^^2  I-  ^  i )  1^2^  l^l^  +  D2iDi2gl^yi^  E~fs'"d/9)E1[Y(y1)  |y2] 

*  * 

+  D21Fjg1(y1)E2[g2(y2)Ei[x|y2]!y1]  ‘  D2lD22D21gl(yl)E~ [8~(y2)E~[Y  (yl}  >2 


+  F^IxlyJ  -  D;*D^F2g1(y1)E2[g2(y2)E2[x;y2]!y1]  +  D^F’E1  [E“  tx|  y,  ]  ;  y  x  1  . 


(31) 

where  we  have  utilized  the  fact  that  the  adjoint  of  P^,  ^ 

P  i  :  U  -II, ,  given  by  [see  Appendix  D,  subsection  _J 

1,1  1  -i. 


is  a  linear  operator 
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P  (dnxdy2)P  (dyixdy2) 

F1  =  :  >(n).' — ^—5 - p - 

Y1  Y2  Py2(dy2)Py/^i^ 


=  8  (y1)E^fg2(y2)E1|v(y1),y  ] iy 

(32) 


Furthermore,  condition  (i)  can  be  rewritten  as 

*  A  A 

4  5  1  +  1  D21DLD21  <****>  -  DUD21?1/1  '  i  0  <33) 

where  I:  F^-*-!^  is  the  identity  operator,  and  K:  F^-*-F^  is  defined  by 


(Kv)(y1)  =  g1(y1)E2[g2(y2)E‘'[Y(y1)  |y2l  1^2 


(34) 


We  now  summarize  these  results  in  the  following  proposition: 


Proposition  3.  Under  Conditions  (1)  and  (2),  the  decision  problem  with  multiple 
probability  measures  admits  a  Stackelberg  equilibrium  solution  if,  and  only  if, 

A  is  nonnegative  definite  and  (31)  admits  a  solution  in  - 

Equation  (31)  will,  in  general,  not  admit  a  closed-form  solution,  even 
if  all  random  variables  are  jointly  Gaussian  distributed  (see  §5.3);  therefore, 
we  will  have  to  resort  to  numerical  computations  which  will  involve  a  recursion  of 
some  type.  Hence,  in  analyzing  the  conditions  of  existence  of  a  solution  to  (31) 
we  may  also  require  that  such  a  numerical  scheme  be  globally  convergent  (or  stable) . 
One  appealing  scheme  whereby  a  unique  solution  to  (31)  [or,  equivalently,  (30)]  can 
be  obtained  is  the  recursion 


(k-1),, 

(-<  )(y1) 


+  3(yx) 


k-1, 2, 


(35) 


where 

exists  in  F 
solution  to 


is  chosen  as  an  arbitrary  element  of  F  .  If  the  limit 
for  such  initial  choices,  then  yS  will  necessari 
(31).  A  sufficient  condition  for  this  readily  follows 


lim 

k—“> 

ly 


.  (k)  A 
constitu 


from 


s 

te 


a 


Lemma  B.l,  which  we  give  below  3S  Prop.  4. 
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Proposition  4.  In  addition  to  the  conditions  of  Prop.  3,  assume  that  there  exists 


a  scalar  p,  0<p<l,  such  that 


r(2)  <  p 


where  r(2)  is  the  spectral  radius  of  z.  Then,  the  decision  problem  admits  a 

s  s  s 

unique  Stackelberg  equilibrium  solution  (y  ,T2[y  ]),  where  y  £T^  is  the  limit  of 
the  iterative  scheme  (35)  ,  and  is  the  affine  operator  (22)  .  a 

We  now  further  elaborate  on  ('36) ,  so  as  to  bring  it  to  a  form  which 
separates  out  the  contributions  from  the  deterministic  and  probabilistic 
components  of  the  problem.  [Here,  we  are  seeking  sufficient  conditions  which 
would  constitute  the  counterpart  of  (20)  in  this  context].  Towards 
this  end,  let  us  first  note  that  using  (34)  in  (25a) : 

■k  -ft  : k 

r(«  -  r(D]24?i:i  *  -  "n'a'a®  (37) 

and  utilizing  the  inequality  relationship  between  the  spectral  radius  and  norm 
of  an  operator  (see  Appendix  A,  Lemma  A.l)  this  can  be  bounded  from  above  by 

k  k  k 

i  <<D12D21?i;i  +  021D12Pl!l  ‘  D2l4D21K>Y 

where  <<  •  >>^  is  the  operator  norm  as  defined  in  (17a).  Using  the  standard 
(triangle  inequality)  property  of  norms,  this  can  further  be  bounded  from  above 


;  <<D]2D21PiU  +  D21D12P1i1>>1  +  <<D!l°22D21K>>l  ' 

k 

2  12 

Now  since  both  D21D22^21  anc*  ^  naP  3  sPace  (2^)  into  itself,  using  the 

norm  inequality  for  products  of  linear  operators,  we  further  have 

*  *  * 

12—  ">  1  — *  Tin 

<r  D“  D  !  +  D“  D  P  >>  +  <<n"  D  D“  ■>  K>> 

-  U12U21‘ 1 , 1  21J12  1,1  1  U:iJ2:U21  1  1 


I  ?  —  ^  1  — >3e  l  1  *  ; I  /  *> 

=  r(Dj?D“1P1  L  +  D2iDi2Pl,l)  +  r(D;iD::D“1)  [r(KK)i 
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where  the  equality  follows  because  (i)  the  spectral  radius  and  norm  of  a 
self-adjoint  linear  operator  are  equal  (13, p. 514],  (ii)  norm  of  a  "non-self-adjoinr 
linear  operator  K  is  equal  to  the  square  root  of  the  spectral  radius  of  the 
self-adjoint  operator  K  K  (see  Appendix  A,  Lemma  A.l).  Finally,  using  the  result 
of  Lemma  A. 2  (Appendix  A),  the  latter  is  bounded  from  above  by 

*  *  * 


r<2>  i  2KDj2D*1D21Dj2)]1/V(P*iiI’1ii)1/2  +  r<°21t4D21)  Ir<K 


*K)  .  (38) 


Now,  let  us  assume  the  following: 

Condition  (3).  There  exist  four  positive  scalars  p,  ,p_,p.,p, ,  satisfving 

I  l  J  *4 


such  that 


2  0^2  +  P3P4  <  1 


•k  *  * 

r<Di2D21021DU>i<V2  ’ 


—* 


r(pi! ipi| i}-  (p2} 


r(K*K)<  (P4)2 


(39) 


(40a) 

(40b) 


Then,  we  have 

Theorem  2.  Under  Conditions  ,Ci-:2)  of  §2  and  Condition  given  above,  the 

decision  problem  admits  a  unique  Stackelberg  equilibrium  solution  (y,T.,[y  ]), 
s 

where  y  €T  is  the  limit  of  the  iterative  scheme  (35),  and  T_  is  given  by  (22). 
■*- 


Proof.  The  result  follows  from  Prop.  4  and  the  discussion  and  derivation 
that  leads  to  Condition  (3),  provided  we  show  that  the  given  three  conditions 
subsume  (33),  i.e.  nonnegativity  of  operator  A.  We  now  verify  that  Condition 
in  fact  implies  that  A  is  a  strongly  positive  operator.  First  note  that  A  is 

k 

?  l  9 

self-adjoint,  because  K  commutes  with  Hence,  using  Lemma  A. 3 

(Appendix  A) ,  we  can  write  down  the  inequality 

*  *  * 


r(A-I)  jj  \  rO^D^D^K+K*))  +  rfD^D^P^  L  +  D2,  D^P*  ,) 


Then,  using  the  line  of  arguments  that  led  to  (38)  from  (37),  and  the  spectral  radiu.' 
inequality  for  the  product  of  two  self-adjoint  operators,  we  obtain  the  bound 


1  i  „2 


1  n2  _2 *  1  * ,  1/  2  , 


t(J-I)  <  -  rCDjjDjjDJp  r(K+K  )  +  r[  (D^D^)  ]  [rCP^  ^  ] 

i  J  c3  r(K+K*)  +  o1c2 


But  note  that 


r(K+K  )  -^sup  [  <v ,  (K+K  )y>1I<y,y>1]  =  2  sup  [<y,Ky>1  |  <y, 
1 1  y€r1 

and  since,  from  the  Cauchy-Schwarz  inequality  of  inner  products, 
i  <Y  ,Ky>1 1  2  <_  (<Y,Y>1i  |<Ky,Ky>1|  , 


we  have  r  (K+K*)  <_  2  sup  [<Ky,Ky>,  |  <Y  ,Y>,  ] 1/2 

Y«ri  1  1 


°  2  SUP  t<Y ,K"K>. | <7 ,Y>J  =  2 [ r (K*K) ] 1/2  <  2o 
v€r  x  1  —  u 


Thus,  r(A-X)  £  c3p4  +  o  o  <  1, 

implying  that  the  spectrum  of  the  self-adjoint  operator  A-I  is  uniformly  in 
the  unit  sphere.  Hence,  A  is  strongly  positive.  - 

For  the  special  class  of  strictly  convex  team  problems  (cf.  §3)  with 
multiple  probability  measures,  several  simplifications  can  be  made.  In  this  case 
eq.  (31)  simplifies  to 


1  * 

V(yl}  "  °12D12  {Ei[E"^(y1)jy2]  +  g1(y1)E2[g2(y2)  (E1  [y (y^  j y, ] 

~  E2[Y(71)|y2J>  |  y  L  ]  }  +  F^fx^] 


+  Di2F2g  (yi)E2fg“(y2)  {E:1tx'y2]  ~  E”fx-y2''r  |y1i 


+  D^VinxIy,],^]  , 


and  in  Ccndizicn  'cj  inequalities  f40a)  are  replaced  by  the  single  inequality 
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[r(DLD^2Di2>l1/2 


"<<D12D12^“°  -  Cl=p3 


where  ^  can  be  taken  to  be  less  than  one.  Hence ,  (39)  reads 


(2o2+P4)  <  1/  p 


(42) 


We  now  summarize  these  results  as  a  corollary  to  Thm.  2: 

Corollary  3.  Under  Conditions  (2)-(2)  of  §2,  and  (42)  given  above,  the  strictly 
convex  quadratic  team  problem,  with  multiple  probability  measures  and  asymmetric 
mode  of  decision  making,  admits  a  unique  Stackelberg  equilibrium  solution 

S  s  g 

(y  »T2[y  ]),  where  y  is  the  limit  of  the  iterative  scheme  (35)  with 
* 


(=Y)(y1)  = 


°12D121(fl! 1 


+  P, 


i)Y(yi) 


l, 

8  (y. 


)E2[g2(y2)E2 


[Y(y1> \y: 


I  yx  ]  1 


and  T?  is  given  by  (22)  . 


Remark  3 ■  When  the  original  problem  is  a  Stackelberg  game,  but  the  probability 


measures  are  identical,  a  study  of  the  original  condition  (36)  reveals  the  inequality 


tU)  I  r(Dj2D^  ♦  d£d{* 


D21D22D215  -  0  *  l' 


This  is  the  existence  condition  associated  with  the  standard  stochastic  Stackelberg 
game,  which  corroborates  the  earlier  result  obtained  in  [25].  - 

We  now  conclude  this  section  by  presenting  the  counterpart  of  Corollary  2 
in  the  present  context,  which  provides  a  set  of  (simpler)  sufficient  conditions  for 
(40b)  to  be  satisfied: 

Corollary  4.  For  a  given  pair  (p2,o^),  the  first  and  second  inequalities  of  (40b) 
are  satisfied  if,  respectively. 


g1(y1)  /  g2 (n)P^  ,  (dnU  =  y  ) 
Y2  y2|yl  1 


and 


S1  (y  ) /  ! s“ Cn ) | (dn ; £  =  y  )  J  S1(b)P“  (dbiy.,  =  n) 
1  2 ; ' 1  1  Y,  yi:y2 


■>  2 

are  uniformly  bounded  from  above  by  and  (p^)  , 


(43a) 

(43b) 
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Furthermore,  if  probability  densities  exist  (with  respect  to  the  Lebesgue 
measure) ,  these  conditions  can  be  expressed  in  terms  of  the  corresponding  probabili“' 
density  functions  p1  (•)  as  follows: 


p^i>  /  py2(n)  2 

~T~  ?2  -T~~  p y2!y/n'yi^dn  :  (c,2y 

py,(yi>  py,(n)  2 


(44a) 


%<*!>  *2 


ri)db^(o  )2 
(44b) 


Proof.  For  (43a)-(43b)  see  Appendix  C;  (44a)-(44b),  however,  follow  readily  from 
(43a)-(43b) .  = 
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5 .  Jointly  Gaussian  Distributions 

In  decision  and  control  theory,  one  appealing  class  of  probability 
distributions  is  the  Gaussian  distribution,  because  it.  leads  to  tractable  problems 
admitting, in  most  cases,  closed-form  solutions,  Indeed  when  the  probability 
measures  of  the  two  DM's  are  identical  and  Gaussian,  equilibrium  solutions  have 
been  shown  to  be  affine  functions  of  the  observations  for  (i)  quadratic  stochastic 
team  problems  defined  on  Euclidean  spaces  [7],  (ii)  quadratic  stochastic  Nash  games 
on  Euclidean  spaces  [8] ,  (iii)  quadratic  continuous-time  stochastic  team  problems 
[9],  (iv)  quadratic  stochastic  Stackelberg  games  on  Euclidean  spaces  [25],  and  (v) 
quadratic  continuous-time  stochastic  Stackelberg  games  [26].  In  this  section,  we 
investigate  possible  extensions  of  this  appealing  structural  feature  to  the  case 
when  discrepancies  exist  between  the  subjective  Gaussian  distributions,  as 
reflected  in  the  covariances  of  the  random  vectors  (y^.y^).  We  could  also  have 
included  discrepancies  in  the  perceptions  of  the  mean  values,  but  such  a  more 
general  treatment  does  not  contribute  substantially  to  the  qualitative  nature  of 
the  results  obtained  in  the  sequel,  and  besides  it  makes  the  expressions  notacienaiiy 
cumbersome.  Interested  reader  could  find  relevant  expressions  for  the  nonzero  mean 
case  in  [27 ] . 

We  first  introduce  notation  and  terminology,  and  delineate  C dndit ions  ill 
and  (2)  of  §2  (§5.1).  Then,  we  study  the  case  of  symmetric  mode  of  decision  making 
in  §5. 2, and  show  that  the  unique  equilibrium  solution  of  Thm.  1  is  linear.  Finally 
in  §5.3  we  treat  the  case  of  asymmetric  mode  of  decision  making,  and  show  that  (in 
contradistinction  with  the  result  of  §5.2)  the  unique  Stackelberg  solution  of  Thm.  2 
is  generically  nonlinear. 

5.1.  Notation  and  Teminolcgy 

1  2 

Let  (x,y, ,v.)  be  zero-mean  Gaussian  random  vectors  under  both  ?  and  ?  , 

1  "  i- 


with 
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covariance 


yiy2 


„i 


>  0  ,  under  P1  . 


(45a) 


y2yl 


covariance  (x.y^y  )  =  cov(x.y)  =  E*  = 


E1  E1' 

X  XV 


E1  E1/ 
yx  y' 


>  0  under  P 


(45b) 


These  probability  distributions  clearly  satisfy  the  absolute  continuity  condition 
(Condition  (1))  of  Thms .  1  and  2.  Furthermore,  since 


gi(?)  =  (det  E1  /det  E^  )exp  {-  ~  'W. £} 
yi  yi 

w  i  -  e1 
1  *  yi  yi 


(46a) 

(46b) 


the  uniform  boundedness  condition  (Condition  :2)J  of  Thms.  1  and  2  is  satisfied 
whenever 

>  0  ,  i-1,2.  (47) 

After  making  these  observations,  let  us  introduce  the  additional  notation 

•  -1 


4  r~^m4  - 


N  .  )Mi,  -  M .  .  B  .  M.  . 
i  *  li  ij  ]  Ji 


B .  M . .  +  W. 
J  *  JJ  J 


(48a) 

(48b) 


/  M  M 

11  12 


21  22 


,  _i 


„  i 


-1 


yly2 


(48c) 


y2yl  >2 


q1  4  [det  E^  .det  Z*  /det  E^  .det  B  .det  E^]1/2 
'  i  '  j  '  j 


(48d) 


in  terms  of  which  we  evaluate  (21a)  [using  standard  properties  of  Gaussian 
distributions]  to  be 

g1(yi)  [gj  (y^)  iyil  =  q1  exp  !-  t  y^i*1 


(49) 
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We  are  now  in  a  position  to  specialize  the  results  of  Thms  1  and  2  to  Gaussian 
distributions  and  obtain  some  r  plicit  results. 

6.2.  Symmetric  Mode  of  Decision  Making 

In  order  to  apply  Thm.  1  to  the  Gaussian  decision  problem  formulated  above , 
we  first  explore  the  satisfaction  of  various  conditions  given  there.  We  have  already 
shown  above  that  Condition  ( 1 J  is  always  satisfied  and  Condition  12)  is  satisfied 
whenever  W  >  0.  For  the  remaining  condition  we  study  inequalities  (20b),  The  second 
of  these  is  satisfied,  for  a  given  if  (using  (21a))  expression  (49)  is  uniformly 
bounded  in  y  ,  and  this  bound  is  no  greater  than  j ^ •  For  uniform  boundedness  of  (49) 
it  is  necessary  and  sufficient  that 

N.  >  0  ,  (50a) 

under  which  the  latter  condition  becomes 


i  .  i.  2 
q  <  (o?) 


Hence  going  back  to  (20a) ,  the  condition 


<  l/qi  ,  for  at  least  one  i=l,2, 
ij  ji  l  1 


(50b) 


becomes  sufficient  for  (19) .  We  are  now  in  a  position  to  state  and  prove  the 
following  theorem: 

Theorem  3.  Let  (47)  hold  for  i=l,2,  and  (50a)-(50b)  hold  for  at  least  one  i.  Then, 
the  quadratic  Gaussian  decision  problem  formulated  in  this  section  admits  a  unique 
stable  Nash  equilibrium  solution  (y°,7°),  where  u°  =  Y?(y^)  are  linear  in  y^,  and  are 
given  by 

Y°(yi)  =  Liyi  i=l,2.  (51) 


m . 


Here,  L.:  ]R  1  —  U1  are  bounded  linear  operators,  constituting  the  unique 
solution  to  the  linear  operator  equations 

.-1  .  ,-l 

l.v,  -  d^.d^ .l.:j  :J  l1  -  y 

1-1  i]  JU  Fj  FjFi  >i  1 


'52) 


-  Z-*  Z 1  l 

ij  j  :<y j  Fj  yy,  -j 


-i  .-i 

1  -1  v  !7\- 

/i  i~xv.  v.  i=0,  Vv.-R  ,  i-1,2. 

■  i  -  i  -  l 
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Proof.  The  existence  and  uniqueness  of  the  solution  follows  from  Thm.  1,  Corollary  2, 

and  the  discussion  that  precedes  the  statement  of  the  theorem.  The  linearity  of  this 

unique  solution,  on  the  other  hand,  follows  by  noting  that  if  the  pair  is 

taken  to  be  linear  in  (y^^)  in  (14)  ,  all  the  terms  of  the  sequence  are  linear,  and 

hence  the  limit  (which  exists  as  already  proven)  is  linear.  Hence,  choosing  as  in 
m. 

(51),  where  L.  :  &  -*■  U .  are  bounded  linear  operators,  substituting  this  into  (14) 

m. 

and  requiring  it  to  hold  for  all  y^S]R  (since  all  probability  measures  are  Gaussian) 
leads  to  the  unique  relations  (52).  a 

Remark  4.  Thm.  3  above  extends  the  result  of  Thm.  2  of  [8]  on  quadratic  Gaussian 
games  to  the  case  when  a  common  probability  space  does  not  exist  and  the  decision 
spaces  are  not  necessarily  finite  dimensional,  and  shows  that  the  appealing  linear 
structure  prevails  when  there  exists  a  discrepancy  in  the  perceptions  of  the  two  DM's 
of  the  underlying  probability  measures.  The  existence  and  uniqueness  conditions  here 
are,  however,  more  restrictive  chan  chose  of  [8],  and  also  involve  the  probabilistic 
structure  (see  (50b)).  Expression  (21a)  in  the  most  general  case  (and  (49)  for  the 
special  Gaussian  case)  is  not  uniformly  (in  y '*")  bounded  by  1,  unless  g1  (y^)=g'!  ( y ^  )  =  1 
a.e.  P1  and  P"1  (which  corresponds  to  the  case  of  equivalent  probability  measures), 

yi  yj 

since  R-N  derivatives  (if  different  from  1)  will  be  both  smaller  and  larger  chan  unitv 


on  sets  of  nonzero  measure.  This  then  implies,  in  view  of  (47),  and  from  (49),  that 

q1  >  1,  i*l,2,  with  the  inequality  being  strict  if  P1  is  not  equivalent  to  for  a 

yi  -V  i 

least  one  i=l,2,  j^i.  In  such  a  case,  even  team  problems,  a  stable  equilibrium  solution 
may  not  exist,  particularly  if  1/q^  <  <  ^  ^or  at  ^east  one  i= 1,2;  j^i. 

This  indicates,  in  general,  the  presence  of  a  strong  coupling  between  probabilistic  a-! 
deterministic  elements  of  the  problem  in  terms  of  existence  conditions.  However,  if 
the  discrepancy  between  perceptions  of  the  DM's  on  the  probability  measures  (measured 
in  terms  of  R-N  derivatives)  is  sufficiently  small,  one  would  expect  to  be  sufficient! 
close  to  unity,  which  ensures  satisfaction  of  condition  (50c)  for  a  fairly  general 


— _  — aa  . 
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class  of  quadratic  strictly  convex  Gaussian  team  problems  (since,  II .  D~?  .  H  .  =  Rd^.D1  II.  = 

c  <  1,  for  such  team  problems).  For  further  discussion  on  this  point  we  refer  the 

reader  to  [10].  = 

In  the  statement  of  Thm.  1,  the  condition  (47)  places  some  severe  restrictions 

on  the  second  moments  of  the  underlying  distributions  (in  case  a  discrepancy  exists)  , 

which  may  however  be  relaxed  if  we  are  willing  to  consider  equilibrium  policies  in  a 

more  restricted  space.  More  specifically,  satisfaction  of  (47)  ensures  that  regardless 

of  what  initial  set  of  policies  the  DM's  start  the  infinite  recursion  (15)  with,  every 

element  of  this  series  is  well-defined,  and  under  (50a)-(50b)  it  will  converge  to  a 

unique  limit  which  is  linear;  in  other  words,  even  if  the  DM's  start  with  nonlinear 

policies,  the  end  result  will  be  a  linear  equilibrium  solution.  The  condition  (47)  is 

restrictive,  because  we  require  (without  imposing  any  constraints  on  the  policy  spaces) 

the  series  generated  by  (15)  to  be  well-defined  even  with  nonlinear  starting  conditions. 

However,  if  we  restrict  the  team  agents  to  linear  policies  from  the  outset,  under 

Gaussian  distributions  (and  following  the  argument  used  in  the  proof  of  Thm.  1)  elements 

of  the  series  (15)  will  be  well-defined  (without  requiring  (47))  and  will  converge  to 

the  equilibrium  solution  provided  that  (50a) -(50b)  hold  for  at  least  one  i*l,2.  This 

line  of  reasoning  then  leads  to  the  following  result  which  we  give  without  a  proof. 

Proposition  5.  Let  r.  be  the  class  of  all  linear  policies  in  the  fora  (51),  with 
m. 

xi  a  a 

®  -*■  U  a  bounded  linear  operator,  i=l,2.  On  7^  x  the  statement  of  Thm.  1 

is  valid  even  if  (47)  does  not  hold  true.  r 

We  now  interpret  these  results  in  the  context  of  two  examples  one  of  which 

is  a  scalar  team  problem  and  the  other  one  is  a  continuous-time  team  problem,  both 

with  multiple  subjective  Gaussian  probabilities. 


Example  1. 

Consider 

a  family 

of 

scalar  Gaussian 

team 

problems,  with 

■4  ■  3n 
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To  investigate  the  applicability  of  Thm.  1  to  this  class  of  problems,  let  us  first 
observe  that  condition  (47)  is  satisfied  if,  and  only  if,  both 

0<y<l  ,  0<n<l  .  (54) 

For  condition  (50a),  we  evaluate  and  require  it  to  be  nonnegative  for  either  i=l , 

or  i=2: 


1=  (nab-c2)  ( 1— u)  [ua2-<?2-(l-u)aS]/{alu2a2-(l-u)  (vab-c2)  ] }  >_  0 

n  =  (nai?-e2)  (1-n)  [ni2-e2-(l-n)ab]/{b[n2i2-(l-n)  (nab-e2)  ] }  0 


(55a) 

(55b) 


Finally,  condition  (50b)  dictates  either 

ua2|d|2<  n[p2a2-(l-u)  (vzb-c2)  ]  (56a) 

or 

n&2|d|^<  u[n2i^-(l-n) (,r\cb-e~) ]  (56b) 

provided  that  the  terms  on  the  right-hand-side  are  positive  (if  not,  then  the 

inequalities  will  accordingly  change  direction) . 

The  set  of  values  for  a,b ,J ,e ,y  ,n  that  satisfy  (54)-(56)  is  clearly  not 

empty.  To  gain  some  further  insight  into  these  conditions,  let  us  consider 
the  class  of  team  decision  problems  in  which  the  discrepancies  between  the  DMs ' 
perceptions  of  the  variances  of  different  Gaussian  random  variables  is  relatively 
small,  that  is  there  exist  suf f iciently  small  e^>0  and  e?>0  such  that  'u*l-€  , 
r,*l-s,,  and  furthermore  e~o,  and  \c\  is  considerably  smaller  than  both  ■  and 
b.  Note  that,  when  £^=E2=®’  conditions  (54)-(56)  are  all  satisfied  (note  that 
|d|<l  because  of  strict  convexity  of  the  objective  functional)  regardless  of 
the  relative  magnitudes  of  e  and  j.  Hence,  when  the  discrepancy  is  only  in  the 
perceptions  of  the  correlation  between  y,  and  y0,  the  scalar  quadratic  Gaussian 
team  problem  always  admits  a  stable  equilibrium  solution.  Now,  for  nonzero, 
but  positive, and  sufficiently  small  the  dominating  term  in  (55a)  is 
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N1  '  e1(paiJ-c?2)  (ya2-e2)/u2a3 

which  is  positive,  in  view  of  (53)  and  the  initial  hypothesis  that  |a/j|>>l. 

Likewise,  D2  is  positive  whenever  0<e2<<1  and  \b!s\>>±.  furthermore,  given 
a  d,  0<d<l,  we  can  always  find  and  both  in  (0,1),  so  that  both  (56a) 
and  (56b)  are  satisfied  whenever  |a|<d.  Hence,  the  conclusion  is  that  when  the 
deviations  of  the  perceptions  of  the  DM’s  from  the  common  Gaussian  probability 
measures  are  incremental  (and  satisfying  (54)),  the  linear  equilibrium  solution 
of  the  Gaussian  scalar  team  problem  retains  its  stability  property  (but,  of  course, 
at  a  different  (possibly  close,  in  norm)  equilibrium  point).  0 

Excxrple  2.  As  a  second  illustration  of  Thm.  1,  for  infinite-dimensional  decision  spaces 
we  consider  here  a  class  of  stochastic  Gaussian  team  problems  defined  in  continuous  time 
More  specifically,  let  U^=U2=C2 (0 ,T)  ,  the  Hilbert  space  of  all  scalar-valued  '  Lebesgue- 
integrable  functions  on  the  bounded  interval  [0,T],  endowed  with  the  standard  inner 

product  /Tu(t)v(t)dt,  for  u,v€£2_  Furthermore,  let  ^  and  Y.,  =  E,  and  the  Gaussian 

0  .21 
statistics  have  zero  mean,  and  variances  be  as  given  in  (53).  Let  =  D22  =  I. 

1  2* 

the  identity  operator  on  and  =  D21  Fredholm  operator 

1  l 

0.o  u  =  J  K(t,s)u(s)ds  (57) 

0 

where  K(t,s)  is  a  continuous  kernel  on  0<t,s<T,  and  finally  let  =  f^(t)  , 

i=l,2,  which  are  continuous  functions  on  [0,1] . 

Now,  conditions  (47a)  and  (50a)  depend  only  on  the  probabilistic  structure, 

and  are  therefore  again  given  by  (54)  and  (55),  respectively.  For  (50b),  however,  we 

2 

have  Co  obtain  the  counterpart  of  (56),  by  simply  replacing  ;.i|  with  the  norm  of  the 

JU  *  -t 

1  1*  1  1  I  T 

operators  D10D,»  and  D.nD,_,  respectively.  Since  d  , 

12  12  12  12  r  12  u  =  >  K(s,t)u(s)ds, 

*  0 

the  self-adjoint  operator  D^D^,,  is  §iven  bY 
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provided  that  the  terms  on  the  right-hand-side  are  positive,  where  X  is  defined 
by  (58a)-(58b) . 

Hence,  under  (54)  and  either  (55a)  and  (59a)  or  (55b)  and  (59b)  , 
the  continuous-time  static  decision  problem  formulated  above  admits  a  unique 
stable  equilibrium  solution,  and  this  solution  is  given  by  (from  Thm.  3): 

Yi(-t,yi)  =  .  i=l,2.  CbO) 

where  k^(t)  are  continuous  functions  on  [0,T],  satisfying 
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kl(t)  "  ^  J  -^(t»s)k  (s)ds  -  (a2  e/ab)  f  K(t,s)f  (s)ds  -  (a1  /a)f  (t)=0  (61a) 

0  y2  0  1  x-yi  1 

T  T 

k2^  ”  /  ^(s,t)k.-(s)ds  -  (o2  c/ab)  j  K(s,t)f  (s)ds  -  (a2  /i)f  (t)=0.  (61b) 

0  x”l  0  *  ^2  2 


Note  that  k^t)  above  stands  for  operator  L±  in  (52),  and  we  have  already  shown 
that  a  unique  solution  to  both  (61a)  and  (61b)  exist  in  C2  fO.T],  under  (54)  and 
either  (55a)  and  (59a)  or  (55b)  and  (59b),  and  this  solution  is  also  continuous. 

Finally,  if  our  interest  lies  only  in  the  existence  of  a  unique  linear 
equilibrium  solution  (not  necessarily  stable) ,  the  required  condition  is  unique 
solvability  of  the  integral  equations  (61a)-(61b),  for  which  a  sufficient  condition 
is  [6] 

( ec/ab )  \  <  1 

where  a  is  defined  by  (58b).  c 

5.3.  Asuirmetric  Mode  of  Decision  Making 

To  obtain  the  counterpart  of  the  results  of  §5.2  under  the  asymmetric  mode 
of  decision  making,  we  first  investigate  the  possibility  for  the  unique  solution  of 
Thm.  2  to  be  linear.  Towards  this  end  we  first  observe  that  the  decision  problem  will 
admit  a  unique  linear  solution  if,  and  only  if,  equation  (31)  is  satisfied  by  the 
decision  rule 


y(y1)  =  Ay  ^ 

ml 

for  some  linear  bounded  operator  A:  E  "*"^1'  Hence,  using  (31),  A  should  be 
the  solution  of  (by  pulling  A  out  of  the  conditional  expectations) 


Ayx  -  Dj2D21AE1[E2[y1|y2]!y1]  +  D^D^Ag"  ^  )E“[g“  (y,)  E*  [y]_ ■  y;  1 

-  D21D2:o21Ag1(y1)E:[g2(y2)E2[yLiyJ  .y^ 

+  F2EL[x'.y1]  +  D21F2gL(y1)E2[g2(y2)E1[xiy2] ':yj 


*  * 

2  „1  .  1, 


-  D;iDj?F2g1(y1)E2[g2(y2)EZ[x!y2]  [y^  -  [E‘  [x  ,  y,  ]  yj  ,  Vy^E 


1  .21 " 


_  ^1 


("63a) 
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Since  the  random  variables  are  jointly  Gaussian  under  both  measures, 

^  :  i>k'1'1'2  <63b) 
-  S10iyt  ,  l,t-1.2  ,  <63c) 

for  some  matrices  and  S^.  In  view  of  this,  (63a)  can  be  rewritten  as 

Ay,  =  <44AS212Sn  +  FiS01  +  d112F2S02S21^1  +  ^  D21D12AS12  -  D21D22D21AS12 
+  D21F2F02  ~  D21D22F2F02^ 


(6 


This  then  leads  to  the  following  Proposition: 


1  2 

Proposition  6.  Let  (47)  and  Condition  (3)  be  satisfied,  and  either  P  ^  P  or 

y  yi 

12  . 

P  t  P  Then,  the  quadratic  Gaussian  decision  problem  with  asymmetric  mode  or 


decision  making  admits  a  linear  (Stackelberg)  equilibrium  solution  if,  and  only  if, 


(i)  there  exists  a  bounded  linear  operator  A:  R 


U,  satisfying 


1  ?  21  11  1221 
A  =  D12D21AS12S21  +  F1S01  +  D12F2S02S21 


(65a) 


and 


(ii)  this  solution  also  satisfies 


*  *  *  * 
211  2122  211 

D21D12AS12  -  D21D22D21AS12  +  D21F2S02 


9  l  9 
D21D22F2S0: 


=  0. 


(65b) 


Proof.  Since  the  "if"  part  is  obvious  in  view  of  Thm.  2,  we  verify  only 
the  "only  if"  part  of  the  proposition.  [In  what  follows  we  adopt  the  notation 
S  ^  0  to  imply  that  the  nonnegative  definite  matrix  S  has  at  least  one 
positive  eigenvalue.]  The  proof  proceeds  by  showing  for  three  exclusive 

1  9  0  . 

(and  exhaustive)  cases  that  f(y,)  =  g  (y,)E“ [g“ (y0)y? , y, I  is  a  nonlinear  function  o 

9  1  12 

(a)  P"  =P  ,  and  P  . 

y  9  y^  y^  y 

2  1  1 

Here,  g  (>’2)  =  1,  and  g  (y^c,  exp  —  y{T*,y  , } ,  where  a  >  0,  and 

1  9 

c.  >  0  is  a  constant.  Hence,  ffy^)  =  g  (y,)  s;^  which  is  nonlinear  since  V, 


1 


0. 
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(b) 


P2 

y2  y2 


P1  -P2 

yl  yl 


Here,  g1(y1>=l,  and  g2(y 2)=c2  exp  f-  ~  where  W2  -  °*  and  c2  >  0 

is  a  constant.  In  this  case,  f  can  be  evaluated  to  be 

f(yx)  *  c(V+W2)_1  VS2^  exp  yfBy^ 

where  V  =  E2 { (y2  -  S2^)  (y2  -  S2iyi>"} 

B  "  s22?sh  -  >  o, 

and  c  is  a  constant.  Since  w2  0,  B  has  at  least  one  positive  eigenvalue, 

and  hence  f(y  )  is  again  nonlinear  in  y^. 

2  1  12 
(c)  P  and  P  #P  . 


y2  y2 


yl  yl 


In  this  case,  following  the  same  lines  as  above,  we  find 

f(yx)  -  c(V+W2)-1  VS2^  exp  {-  y  y'(B+Wj)yii 

which  is  nonlinear  since  both  B  >_  0,  _>  0. 

Hence,  in  view  of  the  preceding  analysis,  a  necessary  condition 
for  existence  of  a  solution  to  (64)  is  that  the  last  term  should  vanish 
(i.e.  (65b))  for  an  A  that  solves  (65a).  - 

Remark  5 .  A  sufficient  condition  for  (65a)  to  admit  a  unique  solution  in  the  Banach 

ml 

space  of  linear  bounded  operators  mapping  H  into  is 

*  *  *  * 

r(D12D21D21Di2>  Ir  tS12S21S21Sl2'  *  1  • 

which  is  clearly  satisfied  under  Condition  (5) .  c 

The  conditions  of  Prop.  6  are  clearly  non-void;  because,  given  the  unique 

121  2  ..... 

solution  of  (65a),  it  may  be  possible  to  find  F2>  F2,  Sq2  and  sq2  30  Chat  ^b) 
satisfied.  However,  it  should  also  be  clear  that  satisfaction  of  (65b)  places  some 

severe  restrictions  on  the  parameters  of  the  problem,  which  in  general  will  not  oe  met 

12  12 

Hence,  it  is  fair  to  say  that,  if  either  P  j*P“  or  P  ,  genericallv  the  problem 

yl  yl  y2  y2 

does  not  admit  a  linear  equilibrium  solution,  even  if  it  is  a  team  problem;  that  is: 
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12  12 

Corollary  5.  If  either  P  or  P  ^P  (or  both),  the  quadratic  Gaussian  decisior 

yl  yl  y2  y2 

problem  does  not  admit  (generically)  a  linear  Stackelberg  equilibrium  solution.  The 
unique  solution,  which  exists  under  (47)  and  Condition  (3)  ,  is  nonlinear . 

The  conditions  of  the  preceding  Corollary  involve  only  the  marginal 
distributions  of  y  and  y2;  in  the  compliment  of  these  conditions  we  can  derive  the 
following  linear  solution: 

1  _  2 

Proposition  7.  For  the  quadratic  Gaussian  decision  problem,  let  both  Py  =Py  and 

9  12 

Pj;  -Py  (but  not  necessarily  Py  y  ’  and  £V6n  ^^^l^  *  Then’  if 

+  tr(D^20^)]1/2  <  1  <“> 

the  problem  admits  a  unique  Stackelberg  equilibrium  solution  for  DM1  (the 
leader)  which  is  linear  in  y^: 


Y1(y1)  =  Ayx 


(67a) 


where  A:  R  “•*  is  the  unique  bounded  linear  operator  solving 


=  (D^D^AS^S^  +  D21D12AS12S21  '  D21D22D21AS12S21  +  F1S01  +  D12F2S02S 


+  D2iF2S02S21  -  D21D22F2S02S21^y1  •  ^l^ 


(67b) 


and  are  defined  by  (31b)-(31c) ,  and  is  defined  by  E^xV]  =  . 

12  12  12  ... 

'Proof.  When  P  =P  and  P  =P  ,  g  (y  )=g“(y?)  =  l  and  hence  Ccnaittor.s  ( 1 )  and  >  D  of 


yl  y2 


y2  y2 


Thm.  2  are  always  satisfied,  and  in  Condition  (3),  c>2= c^=l.  Then,  (66)  is  the  counte 

part  of  (39),  and  hence  existence  and  uniqueness  follow  from  Thm.  2.  Linearity,  on 

other  hand,  follows  by  noting  that  if  we  start  iteration  (35)  with  *0,  since 
1  2 

g  (y^) =g~ (y^)*l  every  term  will  be  linear  in  y^  (see  also  (64)),  and  hence  the  limit 

(which  exists  by  Thm.  2)  will  be  linear.  Then,  substituting  '•^’(y^)=Ay^  in  (31).  we 

1  7 

obtain  (67b),  by  simply  letting  g  (y^)=g  (y0)=l  in  (64). 


When  there  is  a  discrepancy  between  the  DM's  perceptions  of  the  variances 


of  either  y^  or  y2 ,  Prop.  7  will  not  hold,  and  the  problem  will  admit  (genericaliy )  a 
nonlinear  equilibrium  solution,  as  proven  earlier  in  Prop  6  and  Corollary  5.  In  tnis 
case,  an  explicit  closed-form  solution  cannot  be  obtained;  however,  an  approximate 
solution  can  be  derived  by  using  the  iteration  (35)  which,  for  the  Gaussian  problem. 


becomes 


(k+1).  .  1  2  1  r  2 ,  (k).  ,|  ,|  -1  ,  .2*  1*  1,  , 

f  (yl)  =  D12D21E  (yx) ly25  'yl]  +  D21D128  (yl^ 

* 

.  E2[g2(y2)E1[y(k)(y1)|y2]ly1)  -  D^D^D^g1  (y^ 

.  E2[g2(y2)E2[Y(k)(y1)iy2]!y1]  +  (fJs^  +  S^y^ 

*  k 

+  (D21F2Sq2  -  D21D^2F2S22)g1(y1)E2[g2(y2)y2!y1]  . 


If  we  start  this  iteration  with  yv  '(y2)=0,  or  any  linear  function  of  v^,  at 

(k) 

every  iteration  we  obtain  linear  combinations  of  terms  of  the  type  A  v ^  and 

S^y^  exp  {-  ■—  y'V^^y^},  where  A^  and  B^  are  linear  operators,  and 
(k) 

V  _>0  is  an  m^xm^  matrix.  Since  this  is  a  successive  approximation  technique 
under  Kdizic-1.  '5J  ,  even  stopping  the  iteration  after  a  finite  number  of  terms 

will  provide  a  solution  sufficiently  close  to  the  unique  optimum.  Hence,  genericaliy 
a  suboptimal  policy  for  DM1,  which  is  sufficiently  close  to  the  unique  solution  of 
(31),  will  be  of  the  form 


y i (Mf)  =  A^yi  +  1  B^y-,  exp  {-  ~  y{v(s,)yi - 

£<N 


where  N  is  a  sufficiently  large  integer  (related  to  the  number  of  iterations 
taken  in(68)),  and  B^\  are  generated  via  the  iteration  (68).  Mote 

that  as  N—30  this  solution  will  uniformly  converge  to  the  unique  optimum. 

Yet  another  suboptimal  solution  can  be  obtained  by  restricting  DM1 's 
policies,  at  the  outset,  to  linear  functions  of  y^,  i.e.  to  the  form  (62) 
where  A  is  a  variable  linear  operator.  DM2's  response  to  any  such  policv  will 
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also  be  linear  (in  y^) ,  thus  making  in  (10)  a  linear  operator.  Then,  the 

problem  faced  by  DM1  is  minimization  of  (11) ,  with  y(y2)=Ay^>  over  all  linear 

bounded  operators  A.  The  solution  of  this  minimization  problem  will  provide 

DM1  with  a  linear  policy  that  is  (in  general)  inferior  to  the  limiting  solution  of 

1  2 

(68),  unless,  of  course,  g  (y^)=g  (y^)3!  in  which  case  the  two  solutions  will 
be  the  same  (satisfying  (67b)).  We  do  not  pursue  here  the  details  of  the 
derivation  of  the  best  linear  solution  for  the  general  case  (as  outlined 
above) . 

Furthermore,  it  is  possible  to  work  out  the  various  conditions  for  the  specia 
cases  of  the  scalar  and  continuous-time  team  problems  (formulated  as  in  Examples  1  ar 
2)  and  wTite  down  the  equilibrium  solution  explicitly  whenever  it  is  linear.  Such  ar 
analysis  would  routinely  follow  the  lines  of  the  discussion  of  Examples  1  and  2,  and 
hence  will  not  be  included  here  mainly  because  of  space  limitations. 


6 .  Discussion  of  Possible  Extensions,  and  Concluding  Remarks 

In  Che  preceding  sections,  we  have  developed  an  equilibrium  theory  for  two- 
person  quadratic  decision  problems  with  static  information  patterns,  wherein  the 
decision  makers  (DM's)  do  not  necessarily  have  the  same  perception  of  the  underlying 
probability  space;  that  is,  our  formulation  allows  for  discrepancies  in  the  way 
different  DM's  perceive  the  probability  space.  As  indicated  earlier,  when  such 
discrepancies  exist,  even  team  problems  have  to  be  analyzed  in  the  framework  of 
nonzero-sum  stochastic  games,  and  in  such  a  framework  the  Nash  solution  concept  is  the 
most  suitable  equilibrium  concept  if  the  DM's  occupy  symmetric  (non-hierarchical) 
positions  in  the  decision  process,  and  the  Stackelberg  solution  concept  becomes  more 
meaningful  if  there  is  a  hierarcy  in  decision  making. 

Section  3  of  the  paper  has  provided  a  set  of  sufficient  conditions  for 

existence  and  uniqueness  of  Nash  equilibrium  in  the  case  of  symmetric  mode  of  decision 
making,  with  the  additional  feature  that  it  be  stable.  This  is  an  appealing  feature 
of  the  solution  because,  in  order  to  arrive  at  equilibrium  (as  a  consequence  of  an 
infinite  number  of  response  iterations),  each  DM  does  not  have  to  know  the  subjective 
probability  measures  perceived  by  the  other  DM,  but  has  to  know  only  the  policy  adopted 
by  *  he  other  DM  at  the  most  recent  step  of  the  iteration. 

In  Section  4  we  have  presented  a  counterpart  of  the  results  of  §3  under  the 

asymmetric  mode  of  decision  making.  The  conditions  derived  ensure  that  the  equilibrium 
policy  of  the  leader  can  be  obtained  as  the  limit  of  an  infinite  sequence  which 
involves  conditional  expectations  under  two  different  probability  measures.  This 
sequence  [(35) , (27)]  is  structurally  different  from  its  counterpart  in  §3  (see  14), 
even  for  team  problems,  and  it  contains  R-N  derivatives  of  the  two  probability 
measures  as  multiplying  factors  (which  are  absent  in  (14)). 

In  Section  5  we  have  shown  that  when  the  underlying  probability  distributions 
belong  to  a  Gaussian  class,  the  Nash  equilibrium  solution  will  be  linear  (affine,  if 
mean  values  are  nonzero)  in  the  available  static  measurements,  with  the  gain  operator 
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satisfying  a  Lyapunov-type  operator  equation  (cf.  Thm.  3).  This  solution  and  the 
associated  existence  conditions  have  been  studied  further  in  the  context  of  two 
examples  which  involve  scalar  and  continuous-time  stochastic  team  problems  with 
multiple  probability  models.  In  developing  a  counterpart  of  Thm.  3  for  asymmetric 
mode  of  decision  making,  we  have  arrived  at  a  seemingly  surprising  (unexpected)  result  — 
the  unique  Stackelberg  equilibrium  solution  being  generically  nonlinear  in  the 
measurements  (even  under  Gaussian  multiple  probability  measures).  This  constitutes 
the  first  unique  nonlinear  solution  reported  in  the  literature  for  a  quadratic  Gaussian 
static  game  or  team  problemt  It  should  be  noted  that  we  have  not  given  a  closed-form 
expression  for  this  nonlinear  solution,  but  have  instead  provided  a  recursive  scheme 
which  generates  admissible  policies  that  come  arbitrarily  close  to  the  optimum  soluti-.i. 

Several  extensions  of  the  results  presented  in  this  paper  seem  to  be  possib  :. 
Firstly,  we  should  note  that  the  general  Hilbert-space  framework  adopted  in  this  paper 
and  the  general  solutions  presented  for  the  Gaussian  problems  in  Section  5  (Thms.  3 
and  4)  apply  to  other  models  also,  such  as  the  ones  similar  to  the  continuous-time  team 
problem  treated  in  [9]  and  the  Stackelberg  problem  of  [26],  but  with  the  DM's  having 
different  probability  models.  It  is  expected  that  some  explicit  results  (closed-form 
solutions)  can  also  be  obtained  in  these  cases,  but  this  point  has  not  been  pursued  in 
this  paper  and  is  left  for  future  research. 

Another  possible  extension  of  the  results  of  this  paper  would  be  to  the  clar* 
of  problems  in  which  the  random  state  of  nature  (i.e.  •• '  as  well  as  the  measurements 
(y^)  are  stochastic  processes.  The  general  theories  of  Sections  3  and  4  could  easily 
be  extended  so  as  to  encompass  this  class  of  problems  also,  provided  that  the  problem 
is  set  up  under  the  right  mathematical  assumptions.  In  particular,  if  the  random 

Reference  [12]  also  reports  on  existence  of  nonlinear  (Mash)  solutions  for 
quadratic  Gaussian  nonzero-sum  games,  but  there  the  nonlinear  solution  is  one  of  many 
solutions  one  of  which  is  linear,  and  is  due  to  nonunique  intersection  of  reaction 
functions  (which  disappears  under  appropriate  conditions)  . 
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variables  are  taken  to  be  Hilbert  space  valued  weak  random  variables,  with  the  inner 
product  satisfying  some  continuity  and  boundedness  conditions  [11] ,  Thms.  1-4  directly 
apply  to  this  more  general  class  of  decision  problems,  when  interpreted  in  the  right 
framework.  Furthermore,  extensions  to  dynamic  (multi-stage)  problems  is  also  possible, 
by  adopting  the  framework  of  (say)  [8]  for  the  linear-quadratic-Gaussian  problem.  Then, 
the  unique  Nash  equilibrium  solution  under  the  one-step-delay  observation  sharing 
pattern  can  be  obtained  by  basically  following  the  approach  of  [8]  and  utilizing  in  the 
recursive  derivation  Thm.  3  of  this  paper  instead  of  Thm.  2  of  [8] .  Details  of  this 
derivation  are,  however,  rather  involved,  and  will  be  reported  elsewhere. 

Regarding  the  Nash  equilibrium  solution,  yet  another  possible  extension  would 
be  to  multiple  decision-maker  problems  with  more  than  two  (say,  N)  DM's.  Even  though 
the  definition  of  Nash  equilibrium  (cf.  Def.  1)  admits  a  natural  (unique)  extension  to 
such  problems,  that  of  stable  equilibrium  (cf.  Def.  2)  does  not  extend  in  a  unique  way. 
One  viable  alternative  is  to  assume  that  each  DM  reacts  optimally  to  the  set  of  most 
recent  policies  of  all  the  other  DM's,  which  leads  to  a  set  of  N  relations  similar  to 
(9).  In  this  case,  (12)  will  be  replaced  by  N  equations  with  the  right-hand-side 
expressions  involving  N-l  policies  of  different  DM's.  However,  the  line  of  reasoning 
that  took  us  from  (13)  to  (14)  does  not  have  a  counterpart  if  N>2,  and  in  general  it 
is  not  possible  to  obtain  N  recursion  relations  each  of  which  involves  only  one  DM's 
policies  at  consecutive  stages.  Then,  the  counterpart  of  (13)  will  have  to  be  treated 
as  a  "multi-valued"  operator  equation,  in  which  context  an  existence  and  uniqueness 
result  will  have  to  be  established.  This  seems  to  be  a  challenging  problem  whose 
solution  requires  somewhat  different  mathematical  techniques  than  the  ones  employed  in 
this  paper. 

One  source  of  motivation  for  the  research  reported  in  this  paper  has  been 
(as  discussed  in  Section  1)  the  desire  to  investigate  the  sensitivity  and  robustness 
of  team-optimal  solutions  (in  stochastic  teams)  to  independent  variations  in  the 
perceptions  of  the  DM's  of  the  underlying  probability  space  (and,  in  particular,  the 
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probability  measure) .  The  analysis  of  this  paper  indeed  provides  a  framework  for 
such  a  study  when  the  roles  of  the  DM's  are  either  symmetric  or  asymmetric,  since  an 
equilibrium  theory  has  been  established  in  both  cases  within  an  "t-neighborhood"  of 
the  team-optimal  solution.  Some  further  work  is  needed  in  order  to  determine  the 
"satisfiability"  of  the  several  existence  conditions  obtained  in  the  paper  when  the 
region  of  interest  is  an  e-neighborhood  of  a  common  probability  space,  and  to  further 
extend  the  analysis  to  an  investigation  of  sensitivity  and  robustness  properties  of 
team  solutions  (obtained  under  the  stipulation  of  existence  of  a  common  underlying 
probability  space)  in  this  e-neighborhood. 

An  aspect  of  the  decision  problem  studied  here,  which  is  worth  bringing  forth, 
is  that  the  subjective  probability  measures  perceived  by  each  DM  is  fixed  in  advance 
and  the  DM's  do  not  attempt  to  change  their  subjective  priors  during  the  course  of  th 
decision  process.  Hence,  in  this  sense,  the  problem  treated  here  is  categorically 
different  from  the  class  of  problems  treated  in  [18]-[21] ,  where  the  objective  was  fo 
the  DM's  to  arrive  at  a  common  (consistent)  set  of  probabilistic  descriptions  of  the 
unknown  variables.  In  the  symmetric  mode,  there  is,  however,  an  implicit  learning 
process  built  in  the  recursive  process  that  leads  to  the  stable  equilibrium  decision 
rules  for  each  DM,  since  the  DM's  do  not  necessarily  have  access  to  each  other's 
perception  of  the  priors. 

Yet  another  aspect  of  the  problem  treated  in  this  paper  is  that  the  general 
formulation  could  be  viewed  as  a  multi-modeling  in  multiple-decision  maker  problems; 
however,  as  opposed  to  the  singular  perturbations  approach  of  [22] -[24],  here  the 
multi-modeling  is  in  the  probabilistic  description  of  the  decision  problem,  with  each 
DM  having  a  different  probabilistic  model  of  the  "rest  of  the  world." 
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Appendix  A 

In  this  appendix  we  state  a  number  of  results  concerning  the  spectral 
radii  of  linear  bounded  operators. 

Let  A:  f-*T  and  B:  r-*T  be  two  linear  bounded  operators  where  P  is 
a  Hilbert  space  equipped  with  the  inner  product  <•>.  Then  the  spectral  radius 
of  A  is  defined  by 

r(A)  =  lim  sup  [<<A^>>J^^  (A-l) 

fc-KXi 


where  <<A>>  is  the  norm  of  A,  given  by 


1/2 

<<A>>  -  sup  [<Av,Ay>  /<y,y>] 
y€F 

For  self-adjoint  operators  there  is 
radius  and  norm  of  an  operator;  specifically, 


(A-2) 


an  equivalence  between  the  spectral 
if  A  is  self-radjoint , 


r(A)  =  <<A>>  =  sup{  |  <y,Ay>  | /<y,Y> '•  (A-3) 

yer 

[see [13], p.  514].  However,  for  operators  which  are  not  self-adjoint,  such 
an  equivalence  does  not  exist,  and  one  can  only  provide  bounds  on  r(A) : 


Lemma  A .  1 . 


For  any  linear  bounded  operator  A, 


*  l  /  9 

r (A)  <  < <A>>  =  [r (A  A)  ] 


Pvocf.  Since  A  belongs  to  a  Banach  algebra,  <<a'K’>>  <  |,'<A>>;k  and  hence 
r(A)  <_  lim  sup  {  [  <<A>>  [  =  <<A>> 


Furthermore,  ,-<*>>  _  r^.  %* ^  ,  ,1/2 

»  <A>>  -  sup  [<y,A  Ay>/'Y,Y>1 


*  .l/"1  * 

which  is  [r(A  A)J  “by  (A-3)  because  A  A  is  self-adjoint 


± 
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Lemma  A. 2.  Let  A  and  B  be  two  linear  bounded  operators  which  commute.  Then, 
(i)  r(AB  +  A  B  )  _<  2[r(AA*)r(B*B) ]1/2  =  2 [r (A*A) r (BB*) ] 1/2 


(ii)  r(AB)  <  r(A)r(B) 

Proof.  (i)  Since  AB  +  A*B*  is  self-adjoint,  using  (A-3) 

r(AB  +  A  B*)  *  sup((<y,(AB  +  A*B*)y>  i  '/<Y,Y> '  =  2  sup { | <Ay,B*v>  I  /<v  i 

v€P  1  f,Y  J 


y€r 


where  the  equality  has  followed  since  A  and  B  commute.  Using  Cauchy-Schwarz  [3] 
inequality,  this  expression  can  be  bounded  from  above  by 


i  2  sup  f!<Ay,AY>l1/2|<B*Y,B%>i1/2 


<Y,Y> 


•} 


and  performing  individual  supremization  we  further  obtain  the  bound 


_<_  2  sup 
y€r 


<Ay,Ay> 


:  Y  t  Y ' 


1/2 


sup 

y«r 


i  *  *  ,  “i  1/2 

I <B  y.B  Y>! 

<Y  ,Y> 


*  *  *  1/2 
*  2  <<A>>  <<B  »  =  2 [r (A  A)r(3B  )] 

where  the  last  line  has  followed  from  Lemma  A.l.  Note  that  this  expression  can 

*  £  -k  * 

be  written  in  different  ways  because  r(A  A)  =  r(AA  )  ,  r(BB  )  =  r(B  B) . 


(ii)  Firstly  note  that 


r(AB)  *  lim  sup  [ << (AB) k’>>] 1/k  -  lim  sup  [<<Ak3k>>]  1^tC 
k— ^ 


(*) 


where  the  last  equality  has  followed  because  A  and  B  commute.  Now,  since  A,B  belon- 

1c  k  k 

to  a  Banach  algebra,  <<A  B  >>  <  <<A  >>  <<B  >>  for  every  k  <=> 

<->  [<<AkBk>>]1/k  <  [  «A*>>  <<Bk»]1/k  »  [<<Ak>>]1/k  f<<Bk>>]1/k  for  every  k 

and  taking  lim  sup  of  both  sides,  and  using  (*) 

r (AB)  <  lim  sup  { [ <<Ak>> ]  [<<Bk>-]1/,k  <  r(A)r(B) 

k—*> 

which  proves  the  desired  result. 
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Lemma  A. 3.  Let  A  and  B  be  both  self-adjoint.  Then, 

r (A  +  B)  <  r (A)  +  r(B) 

Proof .  This  follows  from  (A-3)  and  the  triangle  inequality  applied  to  norm  <<■>>,  o 
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Proof  of  Theorem  1 

Let  us  first  recall  the  following  result  from  functional  analysis  (see,  for 
example  [13,  Chapter  XIII,  Theorem  3]). 

Lemma  B.l.  Let  $  be  a  linear  bounded  operator  mapping  a  Hilber  space  F  into  itself, 
and  consider  the  equation 


y  =  SY  +  u 


(B-l) 


defined  on  F .  Furthermore,  consider  the  "successive  approximation" 

y =  y  +  (k)  t  k=0 


(B-2) 


to  the  solution  of  (B-l) .  Then,  the  sequence  generated  by  (B-2)  converges  to  a 
unique  element  of  F,  for  any  starting  point  y^eF,  which  is  further  a  solution  of 
(B-l),  if,  and  only  if,  the  spectral  radius  of  5  is  less  than  unity,  i.e.  there 
exists  a  o,  0<p<l,  such  that 


r(5)  <  p  .  (B-3) 

Now,  applying  this  Lemma  to  our  problem,  we  identify  S  with  either  or 

(given  by  (16)),  F  with  F^  or  the  successive  approximation  (B-2)  with  (14),  and 
condition  (B-3)  above  with  (19)  for  either  i=l  or  2.  Then,  the  statement  of 
Thm.  1  (i)  readily  follows  from  the  preceding  Lemma,  in  view  of  Prop.  2. 

Furthermore,  since  $  can  be  written  as  the  product  of  two  commuting 
operators,  using  Lemma  A.2(ii)  we  obtain 
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Under  (20a)  this  can  be  bounded  from  above  by  o^o^  "  <  thereby  ensuring  (19). 

On  the  other  hand,  since  the  spectral  radius  of  a  bounded  linear  operator  is  bounded 
from  above  by  its  norm  [13],  and  that  ]j  D^j|  ^  =  <<D^>>^  because  D*  also  maps  into  „ 
(in  addition  to  being  a  mapping  from  into  itself),  (20b)  follows.  This  complete 
the  proof  of  Thm.  1.  □ 
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Appendix  C 

1.  Proof  of  Corollary  2  (Section  3) 

Here  we  verify  that  the  second  inequality  of  (20b)  is  implied  by  the 
condition  that  (21a)  is  uniformly  bounded  by  p^.  Towards  this  end,  we  first  have, 
for  each  y€f  ,  from  the  Cauchy-Buniakowski  (Schwarz)  inequality  [3]  applied  to  I".: 


Pjiivili  -  !l/  Py  ,v  (dnjy  )/  y(5)Pj  ,y  (dc|n)||  J  <  j|  /  y(?)pJ  (dC !  n)||  J 
Y .  yj : '  i  Y.  yiiyj  Y,  yi'yj  1 


*  /  (/  y(5)P  l  I  (d?|n)  ,  /  y(C)P^  I  Cd^ !  n) ) .  P^  (drOg^n) 

Yj  Yi  ^  yi  ^  ^ 

where  the  last  equality  has  involved  a  change  of  measures,  using  the  R-N  derivative 

gJ  (n)  .  Now,  again  using  the  Cauchy-Schwarz  inequality,  this  expression  can  be  bounc  d 

from  above  by 


<  /  /  (y(0.(y(5))<p£  (v  (d£|n)gJ(n)Pi  (dr) 

Y.Y.  yi 1 '  j 

1  i 


yi 


/  (y(5),(y(5))4P*  .  (dOgL(0  /  .  (dnU)gj(n) 

Yt  yi;7j  Yj  yjyi 


where  the  last  equality  has  followed  from  Bayes  Theorem.  It  now  readily  follows  that 
under  the  condition  of  Corollary  2,  the  last  expression  is  bounded  from  above  by 

1j|  t  2 

<  °2  V"  i*  t^us  Proving  the  desired  result  for  i=l,2.  - 

2.  Proof  of  Corollary  4  (Section  4) 

2 

The  fact  that  uniform  boundedness  of  (43a)  (by  (p0)  )  implies  the  first 
inequality  of  (40b)  follows  readilv  from  the  proof  given  above,  since  the  soectral 

I 
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—  >Sf 


radius  of  1 1  ^l'l  e3ua^  to  c^e  square  of  the  norm  of  1 1 '  Now,  to  verify  that 
uniform  boundedness  of  (43b)  implies  the  second  inequality  of  (40b)  we  follow  basically 
the  same  line  of  reasoning,  but  the  details  of  the  proof  are  more  involved.  Towards 
this  end  we  first  note  that  for  each  y^P^, 

#Ky^  =  II  g1  (€)  /  P2  iv  (dn|y1=?)g2(n)  /  P2  i  (db|y2=n)>  (b)H 2 
1  Y?  y2)yl  Y1  yl|y2 


=  «  y g1(S)  /  P2  -  (dniy  =c)g2(n)  /  P2  .  (db |y  -n) y(b)H 2 
Y2  y2 |yl  Y^  yl  y2 

1  II  / g1(5)g2(n)  /  P2  ,  (db|y  =n)Y(b)»2 


Yx  yl * y2 

where  the  second  equality  follows  from  a  change  of  measures,  and  the  last  bound 
follows  from  the  Cauchy-Schwarz  inequality.  It  should  be  pointed  out  that 


here  we  have  abused  the  notation  and  have  used 


to  mean 


m(C,n)U0  =  {/  /  (m(C,n),m(?,n))  p2  (dSxdn)) 

—  »»  tr  y  2 


1/2 


Y1Y2 


where  m  is  a  y^y2  "  measurable  random  variable  taking  values  in  U^;  hence,  the 
sub-index  "2"  indicates  chat  the  probability  space  is  the  one  determined  by 
the  subjective  probability  measure  of  DM2. 

Now,  the  latter  bound  can  further  be  bounded  above  by 

ill  g1(?)!g20)i2P2  _  (d£xdn)  /  P2  ,  (db|y  =n)(Y(b),Y(b)). 

rr  J  2  ~  y-  I  ^  ^ 


Y  Y 
12 


Y  yl ' y2 


since  (i) 


(j  ?2  v  (db I y2=n) y (b)  ,  /  P“  >  (db!y  =-)v(b)) 

Y1  yly2  Y^  yly2  x 


1  f  l ,,  (db  y.=n)  (y  (b)  ,y  (b) ) 

Y  yl  y2  ^  1 


1 
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by  the 
measure) 
£  and  b. 


Cauchv-Schwarz  inequality 
and  (ii)  g1 (E) | g2 (h ) ,  2  0 • 


2 

(because  P  ,  is  also  a  probability 
yly2 

Hence,  by  interchanging  the  variables 


IlKyll^  < 


/  /  /  (y(0,y(0),  s1  (b) !  g2(n) !  2pJ  (dbxdn)p;  ,  (dn|y  *C)P  (dE) 


Y1Y2Y1 


2  71 


(dn) 


.  P‘ 


(db 1  y0"r'1)P  .  (dnjy,®?) 

yiiy2  2  y2lyi  1 

and  under  (43b)  this  can  be  bounded  above  by 


±1  W5)(t(C),y(5))1  >1  *  Oj'vlJ 


which  completes  the  proof. 


Appendix  D 

1.  Derivation  of  First  and  Second  Dateaux  Variations  [(25) -(26)] 

Starting  with  the  expression  for  J  as  given  by  (23),  we  first  obtain 

AJ (y ;h)  -  J(y+h)  -  J(y) 

=  y  <h,y>1  +  y  <Y,h>1  4-  y  <h,h>1 

+  k  f  {(F^txjy,]  +  D^1E2[v(y1)|y2]  ,  E“[h(y1)  I  y,  ] )  2 


+  (DT1E“[h(y1)  |y,] ,  D^2F“  E“[x!y9]  +  E2  [  v(yL)  :  y ,  ] ) 


+  (D;iE2[h(y1);v2],Di2D^1E2[h(y1);y2]):}  P^(d() 


-  <h,Ei[Dj2D21E2[Y(y1)|y2]  yj  +  E 1 [ 2 E 2 [ x | y 2 ] | y x 2 

-  <Y,E1[D^D21E2[h(y1)|y2]|y1]>1  -  <h,Dj2D^E1[E2[h(y1)  |y2]  jyj 

5  5J(y;h)  +  62J(y ;h)  . 

2' 

Mow,  since  6J(y;h)  is  homogeneous  of  degree  one,  and  6  J(v;h)  is  homogeneous  of  degree 
two,AJ(y;h)  admits  a  unique  decomposition  with  the  corresponding  expressions  being 
(after  some  simplification) 


* 

6J (y ;h)  =  <h,Y>1  +  /  (E2[h(y1) ly2] , 

Y2 

+  D21E2[Y(y1)|y2l»1  P*  (d$)  -  <h,E1[F2xjy1]>1 


( D— 1 ) 


-/  (E2[h(y5)|y9],  D2  F^x)  PX(dx,Y  dO  ' 

XxY_  “ 

1  ?  * 

-  <h,D22D21P1,1Y(y1)  +  D22F2E1[E^[x|y2J|y1)>1  -  <7^,  D^D^Y^ 


52J(Y;h)  -  ~  <h,h>1  +  j  f  (E2[h(y;L)  |y2 3 ,  D21D22D21 

Y2 

.  E2[h(y1)|y2))1  P2  (dO  -  <h,Dj2D2iPi;ih>i 


( D-2 ) 


where  we  have  used  some  properties  of  adjoint  operators  under  inner  products, 
and  the  notation  introduced  in  (28) ;  we  have  also  made  use  of  the  fact  that  the 
bounded  linear  operator  D12D21‘-  commutes  with  the  double  conditional 

expectation  operator  P^j^  (°r  j  ^)  • 

We  now  prove  a  lemma  which  will  be  used  in  simplifying  these  expressions 


further , 
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Lemma  D . 1 .  For  h(*)eU^  ,  fC*)6^’ 

/  (E2 [h(y  ) | y-=£ ] ,  f(5)K  P2  (d€)  = 

Y2  12  y2  CD-3) 

/  (h(n),  g2(n)E2[g2(y3)f (y2) i y,=n] ) ,  P2  (dn) 

Y2  y! 

=  <h,g1(y1)E2[g2(y2)f (y2) !y1]>1 

where  gX(*)  are  given  by  (2). 


Proof.  The  proof  follows  from  the  following  set  of  equalities  where  we  are  allowed  to 
change  orders  of  integration  because  11^  and  are  Hilbert  spaces  of  random  variables 
well  defined  under  both  measures: 


J  (E2[h(y  )|y  -5],  f(5))-  P2  (d£)  =  /  (/  h(n)P2  ,  (dn | S) ,  f(C)),P2  (d() 
v  y  i  i  y  ^ 


Y  Y  y]J  y? 

2  X1  1  “ 


=  /  j  (h(n),  f(4)),P2  .  (dnU)P2  (d5) 

Y2  Y  1  yl|y2  y2 

-  /  /  (h(n),  f(5)).pj  |  (d^l y  =n)g2(n)g2(?)P2  (dn) 

Y,  Y  x  y2  yl  1  yl 

where,  in  the  next  to  the  last  line,  we  have  used  continuity  property  of  inner  product 
2  , 

in  pulling  out  P  j  (dy. | £) .  Now,  pulling  the  integration  over  Y’  into  the 
yl' y2  1 

inner  product,  we  further  obtain 


!  P2  (dn)  [  (h(n) ,  j  g2(e)f(e)P2  ,  (d?|y  -n))  g^n) 

Yl  '1  Y2  y2  '  1 

/  P2  (dn) (h(n) ,  g2(n)  E2 [g2 (y  ) f (v,) | y  =n ] ) 

Y  '  1 


which  is  the  desired  result. 


Now,  using  (D-3)  in  (D-2)  we  obtain 


:”J(<  ;  h)  =  \  'h.h^  \  /  (h(n) ,  g2  ( -  )D2,  [g  (y J  E  [ h ( y ^ ;  y ^  1  ”1  =  " 
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1 

2 


<h’D12D21Pl llh>l 


*  ★ 

2  <h,D21D12Pl  ! lh> 1 


which  verifies  (26) . 

To  verify  (25)  ,  we  apply  the  result  of  Lemma  D.i  to  (D-l)  to  obtain 

* 

5J (y ;h)  =  <h,y>1  +  <h,g1(y1)D21D22  (F2E2 [g2 (y2)E2 [x| y2 ] | y^ 

+  D21EZ[g2(y2)E2[Y(y1) \y2] !y1l>>1  -  <h,F2E1[x|y1]>1 

*  *  * 

-  <h,g1(y1)D21F^E2[g2(y2)E1[x|y2]|y1]>1  -  <h, (D12D21P1 | 1  +  D21D12*l| 1} Y>1 


-  <h,D22F2E1[E2[x|y2] |y1l>1  =  <h,Y>1  -  <h,'Y>1  -  <h,3>1 

where  s  and  3  are  defined  by  (27a)  and  (27b),  respectively.  This  then  completes 
the  verification  of  (25)  and  (26) . 

2.  Derivation  of  an  Expression  for  P ^  i  ^ ,  the  aavovnz  of 

Firstly  note  that 


/  (P*  1Y(y1),h(y  ))  P*  (dy  )  -  /  (7(7^,  p1i1h^i>)iPv  (dyl) 

Y  1  1  yl  Y1  ‘1 

s  E1[(Y(y1),E1[E2[h(y1) |y2] |y1])1]  =  E1 [ (y (y^ ,E2 [h^) j y2 ] ) ^ ] 


where  we  have  used  the  smoothing  property  of  conditional  expectation  under  the 

probability  measure  P2  .  Now,  a  further  conditioning  under  i  yields 

yi  yi'  y 2 

-  E1[(E1[Y(y1)|y2],  E2[h(y1)|y2])1], 
and  using  ( D—  3 )  [cf.  Lemma  D-l]  this  becomes  equivalent  to 
=*  E1[(g1(y1)Ei[g2(y2)E1(y(y1)  )y2]  jyL]  ,  My^)^, 


thus  proving  (32).  The  first  expression  in  (32)  follows  by  routine  manipulations. 
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Appendix  E 

In  this  appendix  we  show  that  the  Stackelberg  solution  satisfying 
(10)-(11)  is  indeed  an  equilibrium  solution — the  so-called  strong  equilibrium 
of  a  decision  problem  with  a  modified  (dynamic)  information  pattern.  Towards 
this  end,  let  us  replace  the  original  decision  problem  with  one  in  which  the 
decision  (action)  variables  are  and  Y^^,  for  DM^  and  >  respectively, 

and  the  information  pattern  is  dynamic  (for  DM2),  with  DM2  having  access  to  the 
decision  y  of  DM1.  Let  1!^  and  denote  the  strategy  spaces  of  DM1  and  DM2, 
respectively,  under  this  new  information  pattern;  furthermore  denote  their 
generic  elements  by  2^  and  3^ >  respectively.  Now,  since  DM1  has  static 
information,  all  permissible  policies  3^  will  be  constant  mappings:  —  7^,  and 
hence  =  7^.  For  DM2,  on  the  other  hand,  all  permissible  policies  will  be 
measurable  mappings  37 :  r^-7? .  Finally,  let  be  the  cost,  function 

of  DMi,  satisfying  the  boundary  condition 


-J  i  ( 3 1 ,  ^2  >  =  Ji^Yl’Y^  ,  V3i  =  YiSri£  ux 


(E-I) 


where  y2£F7  s  uniquely  defined  for  each  Yj€r  by 


Y7  =  3?(y1)  in  r2 


(E-2) 


s  s. 


Now,  let  (Ylt7,)€r  xF  be  a  Stackelberg  solution  to  the  original  decision  problem 
with  the  unique  mapping  T.,  satisfying  (10).  Note  that  T7=0n ,  and  hence 
relabelling  T2  as  S7,  and  y®  as  3®,  in  (10)  and  (11),  we  obtain  in  view  of  ( E  — 1 ) - ( E— 2 ^ 


J,(3®,32)  1  J1(S1,3®) 


VS1€U1 


,(31,3p  <  J2(31,32) 


V(21,22)SU1xU2, 


s  s 

which  clearly  indicate  that  ( 2^ , 37)€U^xll.,  is  a  noncooperative  Nash  equilibrium. 
This  is,  in  fact,  a  stronger  equilibrium  (called  "strong  equilibrium”  [17] i  been 


the  second  inequalitv  is  satisfied  not  oniv  for  i  =2.. 

'li 


il€U:. 


but  for  all 
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Abstract— In  this  paper  discrete  and  continuous-time  rwo-person  deci¬ 
sion  problems  with  a  hierarchical  decision  structure  are  studied  and  appli- 
cabilits  and  appropriateness  ol  a  function-space  approach  in  the  dentation 
of  causal  real-time  implementable  optimal  Stackelberg  (incentive)  strate¬ 
gies  under  various  information  patterns  are  discussed.  Results  on  existence 
and  derivation  of  incentive  strategies  for  dynamic  games  formulated  in 
abstract  inner-product  spaces,  in  the  absence  of  any  causality  restriction  on 
the  leader's  policies,  are  first  presented  and  then  these  results  are  extended 
(and  specialized)  in  two  major  directions:  I)  discrete-time  dynamic  games 
with  informational  advantage  to  the  leader  at  each  stage  of  the  decision 
process,  which  involve s  partial  observation  of  the  follower's  decisions;  and 
derivation  of  multistage  incentive  strategies  for  the  leader  under  a  feedback 
Stackelberg  solution  adapted  to  the  feedback  information  pattern;  and  2) 
derivation  of  causal,  physically  realizable  optimum  affine  Stackelberg 
policies  for  both  discrete  and  continuous-time  problems,  in  terms  of  the 
gradients  of  the  cost  functionals  evaluated  at  the  optimum  (achievable) 
operating  point  (which  is  in  some  cases  the  globally  minimizing  solution  of 
the  leader's  cost  functional).  The  paper  is  concluded  with  some  applications 
of  the  theory  to  important  special  cases,  some  extensions  to  infinite-hori¬ 
zon  problems,  and  some  numerical  examples  that  further  illustrate  these 
results 

I  Introduction  and  a  General  Description  of 
the  Stackelberg  Problem 

•I  General  Introduction 

THE  PRESENCE  of  multiple  decisionmakers  is  a 
common  phenomenon  in  many  large-scale  decision 
problems,  especially  if  they  involve  humanistic  and  socio¬ 
economic  elements.  The  decisionmakers  may  have  noncom- 
mensurable,  and  at  times  conflicting,  preferences,  or  thev 
may  have  basically  the  same  goal  but  may  wish  to  de¬ 
centralize  the  decisionmaking  process  in  order  to  alleviate 
the  heavy  burden  of  acquiring,  transmitting,  and  process¬ 
ing  the  excessive  amount  of  information  needed  for  a 
centralized  control  (Athans  (1).  Ba^ar  and  Cruz  (5]).  In 
either  case,  the  decisionmakers  (or  players,  in  the  terminol¬ 
ogy  of  game  theory)  would  have  different  objective  func- 
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tions.  and  acquire  possibly  different  information  in  the 
decisionmaking  process.  Furthermore,  there  would  be  an 
order  in  which  the  decisionmakers  act  and  or  announce 
their  policies,  and  this  order  would  either  be  fixed  (prede¬ 
termined)  or  determined  as  a  consequence  of  the  players' 
actions.  All  these  factors  contribute  to  the  concept  of 
solution  lo  be  adopted  for  a  general  multiperson  de¬ 
cisionmaking  problem,  and  have  to  be  taken  into  account 
before  the  derivation  of  the  solution  process. 

There  is  a  growing  variety  of  solution  concepts  in  dy¬ 
namic  game  theory  (such  as  team  solution.  Pareto  solution. 
Nash  solution,  etc.,  see  Ba$ar  and  Cruz  (5).  Ba^ar  and 
Olsder  |7J).  and  among  these  the  Stackelberg  (or  leader- 
follower)  strategies  (Cruz  (12).  (13))  have  recently  attracted 
more  and  more  attention,  in  both  the  control  and  econom¬ 
ics  literatures  This  concept  was  first  introduced  hv  Von 
Stackelberg  (26]  for  a  class  of  static  decision  problems 
arising  in  economics.  Then  its  dynamic  version  was  pre¬ 
sented  in  a  control  theoretic  framework  by  Chen  and  Cruz 
(11)  and  Simaan  and  Cruz  (24).  (25).  This  solution  concept 
is  especially  suitable  for  hierarchical  multilevel  decision 
problems  wherein  the  decisionmakers  hold  nonsymmetric 
roles  in  the  decisionmaking  process.  One  of  the  players, 
called  the  leader,  occupies  a  higher  decision  level,  and  this 
superior  position  enables  him  to  announce  his  strategy  in 
advance  and  enforce  it  on  the  other  plavers.  Bv  taking  into 
account  the  optimal  responses  of  the  followers,  w  hich  may 
be  determined  as  the  solution  of  some  other  multiperson 
decision  problem  under  a  specific  solution  concept  relevant 
to  that  problem  (see  (18).  (3().  the  leader  seeks  the  policy 
that  leads  to  a  most  favorable  outcome  for  him. 

Such  situations  arise  in  many  real  world  problems,  fn  a 
large  organization,  the  headquarters  decisionmaker  (the 
leader)  cannot  dictate  every  subdivision's  (the  follower's) 
task  in  fine  detail:  in  its  stead,  it  simply  announces  and 
executes  appropriate  strategies  (policies),  such  as  the  re¬ 
source  allocation  strategy,  penaltv  or  reward  policy,  the 
profit-sharing  policy,  etc.,  so  as  to  induce  the  subdivisions 
to  work  in  accordance  with  the  interests  of  the  entire 
organization  (|2),  (15).  (16).  |19|,  1 22] ).  Some  recent  investi¬ 
gations  have  been  devoted  to  the  studv  and  construction  of 
such  leader  follower  strategies  in  special  types  of  organiza¬ 
tions.  For  example,  a  standard  and  efficient  wav  for  a 
government  (the  leader)  to  solve  the  water  pollution  prob- 


0018-9472/ 84/ 0100-0010S01. 00  ■  19K4  IEEE 


/HfNG  ft  ill  SIM  KFIB1RG  srRATFGItS  AND  INCFN1IVKS 


11 


lem  is  to  design  some  subsidy  programs  or  penalty  policies 
to  encourage  or  induce  the  chemical  plants  (the  followers) 
to  act  cooperatively.  A  utility  company  (the  leader)  may 
use  a  price  strategy  (or  a  pricing  strategy)  to  induce  the 
customers  (the  followers)  to  consume  the  utility  resource 
more  reasonably  ([18),  [20)).  In  a  market  with  both  free 
competition  and  government  adjustment,  the  government 
(the  leader)  may  design  a  strategy  of  adjusting  the  effective 
income  of  the  potential  buyers  of  the  commodity  so  as  to 
induce  the  competing  duopolistic  firms  to  cooperate  and 
achieve  a  Pareto-optimal  solution  [23].  All  of  these  prob¬ 
lems  can  be  studied  in  the  framework  of  Stackelberg  dy¬ 
namic  game  theory,  thus  making  this  new  field  very  prom¬ 
ising  in  applications. 

B.  General  Description  of  the  Stackelberg  Problem 

To  be  more  precise  in  our  description  of  a  Stackelberg 
game  and  the  related  solution  concepts,  let  us  now  consider 
a  two-person  dynamic  game  problem  with  a  hierarchical 
decision  structure  under  which  player  1  acts  as  the  leader 
and  player  2  as  the  follower.  The  state  jc(  - )  of  the  underly¬ 
ing  decision  process  evolves  according  to  either  (in  con¬ 
tinuous  time) 

A(f)-/(f.x(f).«(/).p(i)).  /e[0,r)  (1) 

or  (in  discrete  time) 

x(k  +  1)  =  f(k.x(k).u(k).v(k)), 

k  =0.1  .•••.  A  -  1.  (2) 

where  u(  • )  is  the  leader's  decision  variable  and  r(  • )  is  the 
follower's,  and  they  are  either  time-functions  or  time-series, 
belonging  to  the  corresponding  Hilbert  spaces. 

u(  )  e  L7'[Q,T).  t'(-)  e  LT'[0.  T), 

in  continuous  time; 
or 

i ((■)  e  17\ 0.  A  -  1],  t(  • )  e  lT-[ 0.  A  -  1). 

in  discrete  time. 

Genericallv.  let  us  denote  the  decision  variables  of  the 
leader  and  the  follower  by  u  and  r,  respectively,  and  the 
decision  spaces  by  U  and  V.  Furthermore,  let  X  =  L"[0.  T ) 
or  /7[0.  ,V  -  1]  denote  the  state  space  for  the  process, 
where  ,x(  )  belongs,  and  let  F,  c  X  x  V  and  F,  c  X  x  U 
denote  the  information  (observation)  spaces  of  the  leader 
and  the  follower,  respectively.  A  permissible  policy 
(strategy)  y,  e  T,  for  player  i  is  a  Borel-measurable  map¬ 
ping  from  his  observation  space  into  his  decision  space, 
satisfying  some  additional  regularity  conditions  like  causal¬ 
ity.  Lipschitz  continuity,  etc.  that  will  be  delineated  later  in 
proper  contexts.  One  underlying  assumption  here  is  that, 
with  x,,  e  R "  fixed,  to  each  ( y, ,  y, )  €  x  T, .  there  corre¬ 
sponds  a  unique  state  trajectory  ,x(  )  e  X  and  a  unique 
pair  of  cost  values  { •/,( y,.  y: ).  J:(  y,.  y: )}. 

The  Stackelberg  game  problem  involves,  in  a  nutshell, 
determination  of  a  leader's  policy  y*  €  !',  satisfving 

•My,*.  A  y,*))  =  mindly,,  fly,))  13) 

Yt 


where  T  T,  -*  F,  is  the  unique  rational  response  mapping 
of  the  follower,  i.e., 

T(y,)  =  argmin/Ty,.  y; )  (4) 

y.  e  r. 

where  we  tacitly  assumed  existence  of  a  unique  solution  to 
(4). 

Even  though  this  definition  is  valid  for  all  types  of 
information  available  to  the  players  (i.e..  for  arbitrary  F, 
and  F:),  the  derivation  of  the  solution  will  depend  to  a 
great  extent  on  the  underlying  information  structure,  as  to 
be  elucidated  in  the  sequel. 

1 )  Open-Loop  Information  Structure:  The  players'  infor¬ 
mation  comprises  only  the  a  priori  information,  e.g..  the 
structural  parameters  of  the  problem  and  the  initial  condi¬ 
tions.  In  this  case,  strategies  and  the  decision  variables 
coincide,  and  are  chosen  as  time-functions  from  the  begin¬ 
ning. 

Necessary  conditions  for  the  open-loop  solution  of 
Stackelberg  dynamic  game  problems  can  be  obtained 
without  any  conceptual  difficulties,  although  it  is  rather 
difficult  to  solve  them  analytically  or  even  numericallv  [13], 
[7,  ch.  7). 

2)  Closed-Loop  Information  Structure:  Here  the  leader  is 
assumed  to  acquire  state  information  with  memory,  i.e.. 
elements  of  F,  are  given  by  y,(r )  =  { .v(r),  r  $  t )  or  r,(  k  ) 
=  {x(/).  /  =  0.-  •  -,k  },  thus  leading  to  policies  in  the  form 
u(t)  =  y,(r;  ,x(r).  r  $  t)  or  u,( k  )  =  y,(  k:  ,x(  k  ).  v(  k  - 
1),  -  •  -  ..x(0)).  The  follower,  on  the  other  hand,  could  acquire 
closed-loop  or  open-loop  information.  Any  direct  approach 
towards  the  solution  of  such  dynamic  Stackelberg  games 
meets  with  formidable  difficulties,  since  the  optimization 
problem  (4)  is  “structurally"  dependent  on  the  structure  of 
leader’s  strategy,  that  is.  the  follower  faces  an  optimization 
problem  “parameterized"  by  the  structure  of  y,.  Such 
"nonctassical"  optimal  control  problems  and  indirect  ways 
of  obtaining  the  solution  have  been  discussed  in  many 
papers;  see  [21].  [8],  [9],  [6],  (3),  [27],  and  it  has  been  shown 
that  in  certain  cases  the  leader  can  achieve  the  global 
minimum  value  of  his  cost  function  Jv  This  feature  has 
also  been  established  by  an  “indirect  method"  [4]  under 
two  conditions:  1)  the  leader  can  detect  the  follower's 
action  (detectability);  and  2)  by  choosing  an  appropriate 
strategy  the  leader  is  able  to  threaten  the  follower  bv  severe 
punishment  in  case  of  any  deviation  from  the  desired 
solution  trajectory  (enforceability).  It  has  been  shown, 
moreover,  that  the  closed-loop  information  is  rich  enough 
to  allow  for  the  solution  to  satisfy  additional  design  specifi¬ 
cations.  One  such  specification  involves  a  “robustness" 
feature;  that  is,  in  case  of  a  deviation  from  an  optimal 
path,  not  to  punish  the  follower  i ndefini tel >  at  all  future 
stages,  but  rather  use  an  effective  threat  policy  which 
would  carry  a  punitive  action  role  for  only  a  few  (two  or 
three)  stages.  This  aspect  of  the  problem  and  its  solution 
has  been  discussed  in  some  recent  papers  in  ihe  literature, 
see  e.g..  [27], 

it  Feedback  Strategies  and  the  Feedback  Slatkelhe'g 
Solution  Concept:  A  subclass  of  closed-loop  strategies  com- 
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prises  those  policies  that  depend  only  on  the  current  value 
of  the  state  without  memory:  That  is 

v'i(r)  =  {*(0}  or  v,(/c)  =  (x(/c)}. 

Under  such  feedback  information  pattern  for  the  leader, 
the  Stackelberg  solution  is  still  very  difficult  to  obtain,  and 
in  fact,  in  most  cases,  it  will  not  even  exist  if  the  initial 
state  x(0)  is  taken  a  variable  and  a  solution  is  sought  for  all 
x(0)  eR*  A  way  to  circumvent  this  difficulty,  in  the  case 
of  discrete-time  problems,  is  to  require  that  any  subpro- 
cess-to-go  is  also  an  optimal  process  in  the  Stackelberg 
sense  [25].  This  permits  the  adoption  of  a  dynamic  pro¬ 
gramming  type  approach  which  involves  the  solution  of 
static  Stackelberg  games  at  each  stage  (and  in  retrograde 
time).  In  comparison  with  the  Stackelberg  solution,  the 
feedback  Stackelberg  solution  gives  only  a  suboptimal 
solution;  though  it  has  the  advantage  of  being  simpler  in 
structure,  computationally  feasible  and  implementable. 
Furthermore,  it  has  better  robustness  properties  against 
noise  and  disturbance,  since  the  leader  can  update  on  his 
policy  at  each  stage  of  the  decision  process. 

4)  Incentive  Strategies;  As  we  have  indicated  in  case  2), 
the  leader  may  expect  to  achieve  the  global  optimum  of  his 
cost  function  (the  team  solution),  as  though  the  follower 
was  cooperating  with  him,  provided  that  he  has  memory, 
can  detect  follower’s  actions  and  can  announce  and  imple¬ 
ment  enforceable  policies.  In  order  to  investigate  the 
Stackelberg  problem  from  this  viewpoint,  we  include  the 
decision  v(  ■ )  of  the  follower  directly  in  the  information 
y,(  ■ )  available  to  the  leader,  thereby  allowing  the  leader  to 
adopt  a  strategy  in  the  form 

«(•)-*(*(•>.  v(-))  (5) 

which  explicitly  displays  the  dependence  of  the  leader's 
decision  variable  u  on  the  follower's,  v.  Such  a  dependence 
is  not  necessarily  instantaneous,  and  may  involve  delays; 
furthermore,  y,  may  carry  only  partial  information  on  t>, 
such  as  the  one  obtained  through  the  present  and  past 
values  of  the  state.  Whatever  the  nature  of  the  dependence 
is.  such  a  structure  (as  in  (5))  is  called  an  incentive  strategy, 
because  it  displays  the  extent  of  the  leader's  power  in 
enforcing  a  certain  action  on  the  follower  through  a 
punishment  or  reward  scheme  and  by  utilizing  the  informa¬ 
tion  acquired  through  y,  ([18],  [28]). 

C  Outline  of  the  Following  Sections 

This  paper  is  devoted  to  an  extensive  discussion  and 
derivation  of  closed-loop  Stackelberg  strategies  and  incen¬ 
tive  policies  in  dynamic  decision  problems  of  the  types 
introduced  above,  and  an  elaboration  on  their  properties. 
In  the  next  section  we  first  discuss  the  incentive  decision 
problem  when  the  leader's  permissible  strategies  are  of  the 
form  (5).  in  abstract  inner-product  spaces,  and  present 
some  general  results  on  the  existence  and  derivation  of 
linear  incentive  policies.  These  results  are  then  extended  m 
Sections  III  and  IV  in  two  different  directions.  In  Section 
III  we  treat  the  discrete-time  Stackelberg  problem  with 


dynamic  informational  advantage  to  the  leader  at  each 
stage  of  the  game,  and  under  the  feedback  Stackelberg 
solution  concept  General  conditions  are  obtained  for  ex¬ 
istence  of  a  solution  and  for  this  solution  to  coincide  with 
the  global  Stackelberg  solution  In  Section  IV  we  extend 
the  results  of  Section  II  to  derivation  of  causal  incentive 
schemes  and  construction  of  real-time  closed-loop  Stackel¬ 
berg  strategies  from  a  normal- form  description,  in  both 
discrete  and  continuous  time.  Some  applications  to  im¬ 
portant  special  cases  with  illustrative  numerical  examples 
are  given  in  Section  V 

II  Some  General  Results  on  Existence  and 
Derivation  of  Optimal  Incentive  Strategies 

In  this  section  we  consider  an  abstract  reformulation  of 
the  dynamic  game  problem  of  Section  I-B  with  the  leader 
allowed  to  have  a  partial  measurement  of  the  follower's 
decision  variable  v.  Towards  this  end.  let  L  and  V  be 
Hilbert  spaces,  with  elements  u  and  r,  respectively,  and  the 
cost  functional  Jt  of  player  i  (i  =  1.2)  be  a  mapping  from 
U  x  V  into  R.  In  this  reformulation,  the  dynamic  nature  of 
the  decision  process  is  suppressed.  I\  =  V.  and  T,  is  the 
class  of  all  Borel-measurable  mappings  from  }'  into  L  . 
where  Y  is  a  Hilbert  space  comprising  observations  of  the 
form 

y  =  Mr 

where  N:  V —  Y  is  a  linear  operator  with  full  range  in  > 
The  case  when  .V  is  invertible  is  known  as  the  perfect 
information  case;  otherwise  we  say  that  the  leader  has  only 
partial  information  on  the  actions  of  the  follower. 

A.  Perfect  Information  Case 

Let  (uJ,  t;J)  e  U  x  V  be  a  desirable  solution  from  the 
point  of  view  of  the  leader— this  point  could,  for  example, 
be  chosen  as  the  global  minimizer  of  the  leader's  cost 
function  /,(«,  v)  over  i'y  !■’.  if  such  a  solution  exists. 
Then,  an  optimal  incentive  policy  for  the  leader  is  one  that 
forces  the  follower  to  choose  the  decision  rJ.  b\  making  ihe 
incurred  cost  corresponding  to  r  *  r'  sufficiently  large;  in 
other  words,  for  a  given  incentive  strategy  y,  to  be  imple¬ 
mentable  it  should  satisfy  the  strict  inequality 

J2(  u  =  y,(  r ),  c  )  >  JA  uJ.  <"' ). 

for  all  r  *  I  (6) 

together  with  the  side  condition 

Y,  ('"')  =  it1-  I 'I 

To  formalize  this  concept,  we  introduce  the  sei 

Qj  =  {(«.!■!  e  i  x  I: ./.  (  m,  i- )  ?,  ./,(  u  )  I  i  x  i 

and  immediately  arrive  at  the  following  result. 

Proposition  I .  A  desired  decision  pair  (  u  '.  i  1  i  •  I  ■  I 
can  be  induced  bv  an  incer’ive  strategy  y,  .  I',,  if  in  each 
re  1'.  r  *  rJ.  there  corresponds  a  n  -  ypn-  l  such 
that  (u.r)  <2  A  strategy  that  accomplishes  this  |S  ||)c 
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so-called  (discontinuous)  threat  policy  given  by 

if  l’  =  vJ 
if  v  *  uJ. 

(9) 

Remark  I:  The  preceding  proposition  provides  a  suffi¬ 
cient  condition  for  existence  of  an  optimal  incentive 
strategy.  This  condition  is  also  necessary  if  we  make  an 
additional  behavioral  assumption  on  the  follower,  which  is 
that  on  the  boundary  of  (which  is  his  indifference  curve) 
he  chooses  points  that  are  detrimental  to  the  leader. 

The  next  proposition  shows  that  the  hypothesis  of  Pro¬ 
position  1  is  satisfied  for  an  important  class  of  problems. 

Proposition  2:  If  J2(u ,  u)  is  continuous  and  strictly  con¬ 
vex  on  U  x  V,  any  desired  decision  pair  ( uJ,  i  J)  £  U  x  V 
is  inducible  by  an  appropriate  incentive  strategy. 

Proof:  Since  J2(u,  u)  is  continuous  and  strictly  convex, 
the  set  Qj  is  closed  and  strictly  convex.  We  now  prove  the 
proposition  by  contradiction.  Assume  that  there  exists  a 
(uJ.vJ)s  Ux  V  which  cannot  be  induced  by  an  ap¬ 
propriate  incentive  strategy.  That  is  there  exists  a  0  e  V, 
P  *  cJ.  such  that  ( u,  D)  e  &d  for  every  u  £  U.  Let  va  =  avd 
+  (1  —  <*)f\  0  <  a  <  1;  then  (it,  va)  =  a(uJ,  vJ)  +  (1  - 
a)(ua.  P)  s  where  ua  =  (1/1  -  a)(u  -  auJ)  e  U. 
When  a  —  1,  (5,  ua)  -» (u,  cJ )  and  hence  the  limit  point 
(  u.  vJ )  belongs  to  for  every  u  s  U.  In  particular,  if  u  is 
chosen  as  uu  +  un  and  uJ  -  t/0(t/0  e  U),  the  convex  com¬ 
bination  (uJ,  rJ )  =  (1/2 )(«‘/  ■+■  h0,  oJ)  +  1/2 (uJ  -  u 0,  l>j) 
should  be  an  inner  point  of  the  strictly  convex  set  fij.  This 
is  contradictory  to  the  fact  that  ( uJ.  vJ)  is  a  boundary 
point  of  and  this  completes  the  proof. 

Incentive  policies  that  induce  the  pair  ( uJ,  vJ ),  under  the 
hypotheses  of  Proposition  2  are  not  only  of  the  type  (9), 
but  could  also  be  continuous  and  even  continuously  dif¬ 
ferentiable.  However,  if  we  further  restrict  the  class  of 
incentive  strategies  to  affine  ones  (because  of  their  simple 
structure),  we  have  to  impose  an  additional  restriction  on 
y. .  as  elucidated  in  the  Proposition  3  below,  whose  proof 
can  be  found  in  (28). 

Proposition  3:  For  an  incentive  Stackelberg  game,  let 
JAtt.r)  be  strictly  convex  and  Frechet  differentiable  on 
U  <  V.  and  its  gradient  with  respect  to  u,  evaluated  at  the 
desired  decision  point  ( uJ .  vJ )  e  U  x  V,  does  not  vanish, 
i.e.. 

V„y:(uJ,  cJ)  *  0.  (10) 

Then,  the  desired  decision  pair  can  be  induced  by  an  affine 
incentive  strategy 

?,<(•)  =  uJ  -  <?(c  -  rJ)  (11) 

where  Q:  V  —  U  is  a  linear  operator  whose  adjoint  Q‘ 
6—1  satisfies  the  equation 

Kr,JJuJ.rJ)  -  C?*V„y;(  «■'./■•')  (12) 

which  admits  at  least  one  solution  under  (10). 

It  should  be  noted  that  whenever  a  global  minimum  to 
./.  ( it.  r  i  exists  on  L  *  V  (sav.  (  r  )).  hv  letting  (  u1,  r‘  i 


=  ( u v‘)  above  in  (11)  and  (12),  the  leader  can  force  the 
follower  to  minimize  collectively  the  leader’s  cost  func¬ 
tional  J\(U,  v). 

B.  Partial  Dynamic  Information 

If  the  leader  does  not  have  access  to  u.  he  cannot 
necessarily  enforce  an  arbitrary  decision  pair  ( uJ ,  vJ )  e  U 
x  V  on  the  follower,  and,  in  particular,  ( u‘,  c')  is  in 
general  not  achievable.  In  fact,  achievable  solution  pairs 
will  be  elements  of  the  product  space  U  x  Y.  with  the  best 
achievable  performance  for  the  leader  being  [4], 

min  J,(it,  v)  (13) 

Ux  Y 

where 

•A(",  y)  =  -M".  v*(u,  y))  (14) 

u*(u,  y)  =  arg  {  min/,(«,  u )  subject  to  Nv  =  y  ! .  (15) 

',■6  1'  t 

Here  we  have  tacitly  assumed  that  in  (15)  the  argument  is 
unique  for  every  (u.y)eUx  K,  which  in  fact  holds 
whenever  J2(u,  v)  is  strictly  convex  on  U  x  V  [28,  Lemma 
2],  Further  introducing 

j2(u,  v)  =  J2(u,  vm(u,  y )) 

it  can  be  shown  [28]  that  strict  convexity  of  J2(  u.  v)  implies 

strict  convexity  of  J2(u,  y)  on  Ux  Y.  and  hence  the 

incentive  problem  with  partial  dynamic  information  be¬ 
comes  equivalent  to  one  with  perfect  information,  with 
(u.  y)  £  U  X  Y  being  the  decision  variables  and  Jfu.  y), 
i  =  1,2,  the  cost  functionals.  Propositions  1-3  apply  di¬ 
rectly  to  this  transformed,  or  so-called  “projected"  prob¬ 
lem,  provided  that  the  desirable  solution  pair  ( uJ.  yJ )  is 
chosen  out  of  U  x  Y.  In  this  context,  a  direct  application 
of  Proposition  3  leads  to  affine  optimal  incentive  policies 

Y,(y)  =  uJ  -  Q(y  -  yJ)  (16) 

where  Q* :  U  —  Y  satisfies 

VyJ2(uJ,  yd)  =  QtVuj2(uJ.  yJ)  (17) 

provided  that  /,(«.  y)  is  Frechet  differentiable  on  i  x  Y 
and  V„/.(m‘/,  yJ)  *  0. 

Obviously,  the  operator  Q‘  in  either  (12)  or  (17)  is  not 
uniquely  defined.  Thus,  there  exist  several  candidates  for 
the  solution  of  the  incentive  problem  at  our  disposal  to 
satisfy  some  additional  requirements.  Some  possible  ways 
of  constructing  the  operator  Q*,  with  application  examples 
and  other  details  on  these  approaches  can  be  found  in  [28|. 
Yet  another  possible  selection  criterion  based  on  sensitivity 
considerations  has  been  presented  and  discussed  in  [10], 

III.  The  Feedback  Stac  kelberg  Game  with 
Stauewise  Information  Advantage  to  thi 
Leader 

As  one  application  of  the  general  results  presented  in  the 
previous  section,  we  consider  here  a  feedback  dvnamu 
game  in  discrete-time,  as  described  b\  the  state  evolution 
(2)  and  with  plaver  i\  cost  function  given  b\  ,/"(  \...  ,i.  ,  i. 


Yi(t’)  = 


any  u  such  that  (u,  u)  £ 
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where 

J,k\x(k),u?  \rll) 

V  1 

=  L  g,(y,  x(y).  n(y),  i'(j))  +  s,(N,  x(N)) 

/  *A 

uAv  1  =  (u(A),  u(k  +  1).-  •  ,u(,V  -  1)}. 

t’iv  1  =  [v(k).v(k  +  1),-  •  ,v(  N  -  1)} 
u  =  v  =  tJ11, 

u(  A )  e  Rm',  »(t)€R^,  j(Uer. 

A  =  0. 1. -./V  -  1  (18) 

We  endow  the  leader  with  such  an  information  pattern  that 
permits  him  to  use  incentive  strategies  under  partial  ob¬ 
servation  of  the  follower’s  current  actions;  that  is,  letting 

y,(A)  =  y(/c)  =  Nk(x(k),  v(k))  e  Yk  C  R'; 

Nk:U"  X  Rm;  -*  Rfl  (19) 

we  assume  that  permissible  policies  for  the  leader  are  Borel 
measurable  mappings 

•yt  ( A ;  • ) :  R“  X  Up  -»  R  m|  (20) 

so  that 

«(A)  =  Yl[A;.x(A).>(A)].  (21) 

For  the  follower,  on  the  other  hand,  we  assume  that  only 
feedback  state  information  is  available,  i.e., 

i’(A)  =  y2[k-,x(k)}.  (22) 

What  we  envisage  here  is  a  decision  making  process  wherein 
the  leader  is  dominant  only  stagewise.  not  only  by  an¬ 
nouncing  his  policy  ahead  of  the  follower  but  also  by 
incorporating  partial  information  on  the  follower's  current 
action  in  his  incentive  strategy.  More  precisely,  the  rules 
that  underlie  the  game  are  as  follows;  At  each  stage 
A  =  0. 1.  •  -  ..V  -  1,  the  leader  announces  his  strategy  u(  k  ) 
=  y, ( A ;  x(  A ),  y{k  ))  first,  to  which  the  follower  reacts  by 
minimizing  his  stagewise  cost  function.  This  then  de¬ 
termines  the  values  of  >( A ).  u(A).  e(A)  and  .x(A  +  1)  in 
terms  of  x(  A ).  and  transition  to  the  next  stage  lakes  place. 
Of  course,  while  making  decisions  at  each  stage,  the  players 
will  have  to  anticipate  their  future  moves  and  arrive  at 
their  policies  accordingly.  At  each  stage  a  dynamic  Stackel- 
berg  game  (incentive)  problem  of  the  type  discussed  in 
Section  11  is  solved,  with  the  leader,  not  only  announcing 
his  policy  ahead  of  the  follower,  but  also  having  informa¬ 
tional  advantage  (partial  information  on  the  follower's 
decision).  We  call  such  a  game  a  "feedback  Stackelberg 
game  with  informational  advantage  to  the  leader"  and  (he 
associated  solution  concept  the  "feedback  Stackelberg 
solution  with  informational  advantage  to  the  leader" 
(FSIA).  Note  that  this  solution  concept  coincides  with  the 
standard  feedback  Stackelberg  concept  tcf.  [24].  (25|)  in  (he 
case  Ajt  vl  A  ).  r(A  )l  is  independent  of  r(A  ). 

We  now  discuss  derivation  of  the  FSIA  for  the  finite 
horizon  multistage  decision  process  formulated  in  this  sec¬ 
tion 


Let  us  consider  the  last  step  decision  problem  starting 
from  t(  ,V  -  1),  with  only  u( /V  -  1)  and  /V  -  1)  to  be 
determined  (the  problem  <;V  -  1)).  Following  the  discus¬ 
sion  of  Section  fi-B,  the  best  response  of  the  follower  lo 
fixed  values  of  x(  iV  -  1 ).  u(N  -  1 )  and  v(  ,V  -  1 )  = 
,Vv  ,(  x<  ,V  -  I ).  v(N  -  1 ))  will  be 

t-(  N  -  1) 

=  arg  (  min  J?  '( x{  N  -  1).  u(  N  -  l ).  i-<  V  -  1)). 

1  /  <  v  1) 

<\  ,  e  Nv\( X(N  -  1).  v(.V  -  1)] 

£  r*  ,(.V  -  1.  u(,\  -  1).  y(,V  -  1)).  (23) 

where  ;Vv  1 ,(  v.  v)  =  {  r  e  Rm ,Vv  ,(.x.  o)  =  y  ),  thus, 
leading  to  the  "projected"  cost  functionals 

J,v  1  [x(  .V  -  l).u(,V  -  1).  v(  N  -  1)| 

=  /  vl  [x  ( ,V  -  1).  «(,V  -  l).et  .,(  x(.V  -  1). 

x«(.V  -  l).i(.V  -  1))]  (1  =  1.2).  (24) 

Therefore,  the  lowest  cost  value  the  leader  can  hope  to 
attain  is 


•[.x(.V  -  1 ).  «( ,V  -  1).  t'l.V  -  1>)  (25) 

Let  us  assume  that,  for  each  x(  .V  -  l )  e  R",  there  exists  a 
unique  solution  (u'(.V  -  1).  i'(.V  -  1))  to  (25).  (If  the 
solution  to  (25)  is  not  unique,  we  adopt  one  of  the  possible 
solutions  according  to  some  other  consideration  of  prefer¬ 
ence  (for  the  leader),  see  |4]  for  a  discussion  on  this  point.) 

Now  introduce  the  counterpart  of  set  (8).  in  this  context, 
which  will  depend  explicitly  on  x(  V  -  1): 

S2V  ,(.x(.V  -  II)  =  («.  v)  £  R""  x  Y\  ,|/v  1 

(  x (  V  -  1).«.  v)  1  (  x ( .V  -  l)u'(.V  -  1). 

l'(.V-l))j  (26) 

and  let  Q\  ,(  v(  V  -  1  >)  denote  us  complement.  Then,  we 
have 

Definition  I.  For  problem  ( .V  -  1 ).  a  state  x(  ,V  -  I )  is 
called  incentire  controllable  if  either  i\  ,  is  a  singleton  or 
for  any  v  e  >\  ,.  v  *  v'(  V  -  1).  there  exists  it  €  R"1 
such  that  ( m,  i  )eS2s  , ( \ (  V  -  111.  Furthermore,  if  all 
states  x(  ,V  -  l)t  R1'  are  incentive  controllable,  then  the 
problem  ( ,V  -  I )  is  called  completely  tin  entire  controllable. 

Now.  an  existence  result  follows  immcdiateh  from  Pro¬ 
position  2.1 ; 

Proposition  4:  Assume  that  problem  (A  -  1 )  is  com¬ 
pletely  incentive  controllable.  I  hen  for  each  vv  .  there 
exists  an  incentive  strategy  inn  I)  -  y,|A  1.  \(  A 
I ).  t  (  A  -  1  )|  which  forces  (lie  follower  lo  lake  the  decision 
rf  V  -  1 1  =  /■*  , (  x (  A  -  1 ).  n‘i  V  1 ).  i '(  V  1  n.  with 
the  realized  cost  value  for  the  leader  being  the  minimum 
value  of  j'  i.e..  /  '  1  (  x  (  A  1  n. 


ZHkV.  t7  ill.  STACKELBERG  STRATEGIES  AND  INCENTIVES 


I  ? 


Remark  2:  If  y(/V  -  1)  =  r(  V  -  1),  or  c(;V  —  1)  — 
'v\t  ’•  t(.x;( ,v  -  i),  i'(  jv  -  1))  exists  uniquely  the  leader  has 
complete  access  to  the  follower's  decision  and  the  problem 
becomes  one  with  perfect  dynamic  information  (see  Sec¬ 
tion  II- A).  In  this  case  the  attainable  lower  bound  for  an 
incentive  controllable  state  xv  ,  is  exactly  the  team  solu¬ 
tion 

1 1  ~ '  I  .x ( X  -  1))  =  min 

ut  V—  1).  ••<  .v  - 1 1 

•y,v  l(.t(-V  -  1).  «(.V  -  1 ) .  {.' ( .V  -  1))  127) 

which,  is  obviously  the  absolute  lower  bound. 

The  result  of  Proposition  4  can  now  be  applied  recur¬ 
sively  by  simply  replacing  7,*  with  the  cost-io-go  function 
J'  to  be  introduced  below  and  by  appropriately  redefining 
/*(  ,c(  A )).  Towards  this  end.  let 

/  *  { x(k  ),  u(  k  ),v(k))  =  £*(  k.  x(k  ).  u(  k  ). 

■i'{k))  +  lf~l(x(k  +  1))  (28) 

where 

x(k  +  1)  =/(A..v(A).«(A),e(A))  (29) 

and 

‘  1  ( ,v (  k  +  1 ))  =  minT‘  * 1  ( .v (  k  +  1 ),  u,  y  ) 

U,  t 

=  j{"l(x{k  +  1 ).  u'(  k  +  1  ).y'<k  +  1)) 

(30) 

/,*•'(.*( k  +  1))  -Jy'ixik  +  1). 

u‘(k  +  1).  >'(A  +  D).  (31) 

Construct 

t’f  I  .v (  A  ).  u(  k  ).  >■(  A ))  =  arg  [  min/:A  ( ,v(  A  ).  m( A  ).  r ). 


■re  ,V(.x(A).y<A))J  (32) 

j,k\  v(  A  ).  u(  A  ).  r(  k  ))  =  Jk  (,t(  A  ).  t/(  A  ). 

v* ( ,v ( A ).  u(k).  >(A))  ii  =  1.2) 

(33) 


(  «'( A;  x(k  )).  >■'(  A;  ,x(A))) 

=  arg  [  miny,*(.v(A  ).  n.  r)!  (34) 

'  //.l  ' 


(  v (  A  ) )  =  7/ ( .x ( A  ) .  u'(  A  ) ,  v'(  A  ) )  (/  =  1.2) 

(35) 


(  x  (  A  ) )  =  {(«.  v)  R""  x  Yk\j~  (  x(  A  ).  u.  i  ) 

«y/(  v(A).w'(A).  v'(  A  )) } .  (36) 


Then,  the  problem  considered  at  stage  A  has  the  protected 
cost  functions y*  andy;‘.  with  a  lower  bound  on  the  former 
given  bv  /,‘(x(At).  The  following  (recursive)  definition 
now  paves  the  wav  for  Proposition  5.  the  generalization  of 
Proposition  4 

Definition  2  I  he  i  V  A  i-stage  problem  <  A  I  is  called 
mmpleteh  tm  enure  i  nntmllable.  if 


1)  the  corresponding  problem  (A  r  1)  is  completely  in¬ 
centive  controllable;  and  2)  the  equivalent  one-stage  incen¬ 
tive  problem  (28)-(29)  is  completely  incentive  controllable 
in  the  sense  of  Definition  1. 

Proposition  5.  For  a  completely  incentive  controllable 
problem  t  K  ),  and  for  each  starting  state  xk .  there  exists  an 
optimal  incentive  strategy 

«*(*)-  n*[ A:.x(A).  v(A)J 

,  u'(  A;  .x(  A  )),  when  y(  A  )  =  i  '(  A;  x(  A  )) 
I116R such  that  ( u.  y )  e  £l[  ( a  ( A  ) ). 

when  y(  A  )  *  i '(  A;  xl  A  )).  (  37) 

that  forces  the  follower  to  take  the  decision  c(Ai  = 
v*\xik).  u‘{  A  ).  i  '(A  )),  with  the  realized  cost  value  for  the 
leader  being  /*(.x(A ).  This  constitutes  a  FSIA  solution  for 
the  dynamic  game  problem  considered  in  this  section. 

Remark  3.  Equations  (28) -(35)  constitute  the  recurrence 
relations  between  /'(xfA)  and  /*‘l(.v(A  -  1)).  (i  =  1.2). 
This  is  the  generalized  optimality  principle  for  the  feed¬ 
back  Stackelberg  game  problem  with  informational  ad¬ 
vantage  to  the  leader,  under  the  assumption  of  the  com¬ 
plete  incentive  controllability. 

We  now  put  some  more  structure  on  the  underlying 
spaces  and  functionals,  in  order  to  obtain  some  specific- 
results.  The  first  set  of  such  restraints  and  the  main  result 
that  ensues  are  the  following. 

Proposition  6:  The  feedback  Stackelberg  game  is  com¬ 
pletely  incentive  controllable  if  for  each  v(A  )  e  R and 
A  =<).••  -  ..V  -  1.  Yt  is  a  vector  space  and  /,*(  v(A  ).  n.  \  i 
is  continuous  and  strictly  convex  in  the  pair  («.  i  )  *= 
R”\r  Y\. 

Proof:  Verification  of  this  result  involves  a  repeated 
application  of  Proposition  2  in  a  routine  way.  and  is 
therefore  omitted. 

Corollary  I:  When  we  construct  the  sequence 
f  y*(.t(A  ).  n(  A  ).  r(A)))  according  to  relations  (28)  (35). 
and  recursively  from  A  =  V  -  1  backwards,  if  all 
./-*(  x(  A  ).  n(  A  ).  i  ( A  ))  arc  continuous,  strictlv  convex  in 
«(A)  and  r(A)  for  all  xl  A  )  R"  and  A  ■  0.  then  the 
problem  always  admits  a  FSIA  solution,  with  one  such 
optimal  incentive  strategy  given  hv  (37). 

The  conditions  of  this  corollary  (and  of  Proposition  (i| 
are  actually  satisfied  for  a  class  of  problems  of  practical 
importance.  Consider,  lor  example,  the  following  set  of 
sufficient  conditions: 

1)  .cM.V;  v)  is  convex  in  x  ■=  W". 

2)  !»:(A:  x.  u.  r)  is  decomposable  in  (he  form: 

g:(A:  x.  u.  r\  =  /'.(A;  x|  *■  ./,iA;  u.  r  ).  where 
P  lk:  x)  is  convex  in  \  and  ./  (A;  u.  i  i  is  Mricilv 
convex  in  (  u,  >•  i; 

3)  /(A;  \.u.r )  and  V(  x.  i  )  are  affine  m  their  uiuii- 
ments; 

4)  m  (A:  \  )  and  i  lA.  \  )  are  affine  in 

si  i  f(  i.  n.  \  )  is  affine  in  x.  u  and  i 

These  guarantee  satisfaction  of  (lie  lixpoilic-os  .  .1  ihc 
enroll. irv  One  such  special  class  is  the  lineal  qu.idi.ilK 
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problem  where 

*;(  V;  x)  =  <x.Q,x)  («>v  »  0) 

y.(A.  x.u.r)  =  (x.Qkx)  +  (u.  Rku'  +  (r  Skr'/ 

( Q .  <=  0.  Rk  >  0.  S\  >  0) 

where  <  •  .  • }  denotes  appropriate  inner  products  in  vector 
spaces: 

/( A.  x.  »,(  )  =  Ak x  +  Bku  +  CkL< 

:V;  (  X ,  l>)  =  \*l’ 

where  .4^.  C\,  are  matrices  of  appropriate  dimen¬ 

sions.  with  Sk  being  of  full  rank. 

It  readily  follows  from  Proposition  3  that  in  this  case  the 
FSIA  solutions  are  not  only  of  the  type  (37),  but  can  also 
be  taken  to  be  affine,  in  which  case 

u'ik)  =  L,(A  ).x(A).  y'(  k  )  =  LAk)x(k) 

r*(  v.  it.  y  )  =  :V/,(  k  )x  +  Mz ( k  )«(  k  >  +  M-A  k  )>•(  k  ) 

«*(*)-  y*  [  k  :  x(A).  >(  Ac )] 

=  LAk)x(k)-  (?,(*)[>•(*)-  L:(k)x(k) J 

with  capital  letters  denoting  matrices  of  appropriate  di¬ 
mensions.  and  £,(  k  )  being  a  gain  matrix  whose  transpose 
satisfies  a  gradient  equation  of  the  type  (12).  for  each 
k  >  0.  Explicit  expressions  for  these  matrices  can  be  ob¬ 
tained  by  basically  solving  { 28)— ( 35 ).  recursively,  and  by 
noting  that  Jk  and  /'  are  quadratic  functionals  for  each 
k  a  o" 

Remark  4  The  preceding  results  find  natural  extensions 
to  the  class  of  problems  wherein  the  control  and  measure¬ 
ment  spaces  are  arbitrary  (infinite  dimensional)  Banach 
spaces,  instead  of  being  finite  dimensional.  Particular!-.,  for 
the  linear-quadratic  problem  discussed  above,  the  same 
affine  structure  prevails  provided  that  we  interpret  the 
inner-products  appropriately  and  replace  all  matrices  with 
linear  operators.  Such  a  result,  then,  would  be  applicable  to 
continuous-time  dynamic  games  in  which  the  decision 
makers  have  access  to  sampled  information  and  the  feed¬ 
back  solution  is  defined  in  between  different  sampled 
subintervals. 

Remark  5:  Under  the  conditions  of  Proposition  4  and 
Corollary  3.1.  and  when  the  leader  has  perfect  access  to  the 
follower's  decision  variable  at  each  stage,  the  affine  FSIA 
solution  has  also  a  robust  feature  in  the  sense  that  its 
truncated  version  constitutes  a  FSIA  solution  to  a  dynamic- 
feedback  game  of  shorter  duration,  defined  on  the  interval 
| A.  V  -  1|.  for  any  A  >  0.  This  result  is  a  direct  conse¬ 
quence  of  the  fact  that  the  trajectory  corresponding  to  the 
original  FSIA  solution  satisfies  the  principle  of  optimaluv 
(being  the  team  solution  from  the  leader's  point  of  view) 
and  the  leader's  affine  FSIA  strategy  employs  onlv  current 
state  information. 
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dynamic  games  of  the  tvpe  introduced  in  §I1-B.  and  under 
the  closed-loop  information  pattern.  Flere.  ihe  leader  will 
not  have  any  stagewise  informational  advantage  over  die 
follower,  but  he  will  still  dominate  the  decision  process  h\ 
announcing  his  strategy  ahead  of  time  and  enforcing  a  <-n 
the  follower,  in  accordance  with  the  solution  concept  Cm 
(4)  Furthermore,  because  of  its  appealing  features,  we 
restrict  attention  to  those  strategies  for  the  leader  that  arc 
linear  in  the  dynamic  part  of  the  information,  and  also 
assume,  without  any  loss  of  generality,  that  the  follower 
employs  only  open-loop  policies  (which  does  not  lead  to 
any  degradation  in  his  performance  (see.  e.g..  [ 7 j > 

Let  J,  and  J2  be  appropriate  cost  functionals  for  placers 
1  and  2,  which,  for  fixed  initial  stale  x,,  e  R".  can  a!wa;-s 
be  rewritten  (by  elimination  of  the  state  variable)  as  func¬ 
tions  of  solely  the  decision  variables  (u.  r)  €  i  x  I’  (see 
Section  I-B  for  notation).  Since  every  discrete  or  continu¬ 
ous-time  dynamic  game  can  be  expressed  in  this  form,  the 
analyses  and  results  of  Section  II  are  directly  applicable 
here  provided  that  the  corresponding  optimal  incentive 
strategy  for  the  leader  is  permissible,  i.e..  it  is  causal  and 
satisfies  the  additional  structural  restrictions  that  max  Iv. 
imposed  on  elements  of  T,.  Specifically,  let  us  assume  that 

1)  Through  the  closed-loop  state  information,  the  leader 
is  able  to  infer  perfectly  the  past  values  of  r<  j.  the 
decision  function  of  the  follower. 

2)  J2Ut.  r )  is  Frechet-differentiable  and  strictlv  convex 
on  U  x  V 

3)  A  global  minimum  to  7,(  u.  r)  exists  on  {  *•  J  .  whu  i: 
we  denote  bv  ( a’.  r‘  )  S  L  x  J'.  and  which  is  adopted  as  a 
desirable  solution  bv  the  leader. 

4)  \t  this  solution  point. 

VUJ:(  u*.  /•'  )  *  0.  (3s  i 

Then,  we  know  from  Proposition  3  that,  in  the  absence  of 
causality,  every  optimal  affine  Stackelberg  solution  can  be 
written  as 

u  =  y,(  r )  =  u'  -  (3(  r  -r)  I  -'u) 

where  the  adjoint  of  Q.Qm‘.  i  -*  f  .  satisfies 

V,  JA  m*.  r‘  )  =  Q*V„JA  a',  r'  I  (401 

Now.  the  real  question  here  is  whether  we  can  find  an 
operator  Q  whose  adjoint  satisfies  (4l)|  above,  and  which  is 
further  causal  and  leads  to  a  police  y,.  as  given  bv  (34). 
belonging  to  a  given  closed-loop  police  space  !',.  We  'how 
below  that,  under  Ihe  closed-loop  pattern  and  for  both 
discrete  and  continuous-time  problems  sutisl'suig  ap¬ 
propriate  structural  assumptions,  such  a  linear  operator 
can  be  contracted. 

Towards  this  end.  we  first  introduce  some  notation.  Let 
the  inner-product  of  two  elements  M  i  and  cm  i  m  /. ’"(<!.  /  ) 
be  defined  be 

/ .  e  *  f '  n  Dvii )  Ji  I  e  '< 1 •  n  / 1  Ji  i4|  i 

and  further  introduce  the  notation 

j  i  ( i  lei  / )  Jt  j  e'(  m  II 1 1  iii  i  42  1 


In  this  section  we  turn  our  attention  to  ihe  global 
Stackelberg  solution  in  both  discrete  and  contmuous-iime 


I  ■  V  , , 
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where  0  $  tx  ^  r;  T.  Similarly,  for  discrete-time  process¬ 
es.  the  inner-product  of  /( ■ ).  g(  ■ )  e  /"'[O,  V  ]  is  defined  by 

v  v 

(/■g)  =  Lf'{i)g(i)  =  E  (43) 

/-(>  /-() 

and  furthermore 

(f-g)<k.n  =  E/'(<)g(0  =  E  g'd)f(i)  (44) 

i-A  i-C 

where  /c.  /  are  integers.  0  <  k  <  /  <  /V.  Now  introduce 

<!>(  •)=  Vuy>(«.  f).  +(■)  =  V,  Jz(u.v)  (45) 

where  the  gradients  are  evaluated  at  some  specific  values  of 
u  e  U  and  v  €  V  that  will  be  clear  from  the  context. 

To  reveal  a  property  of  <{>(  •)  and  'P( ■ )  which  is  vital  in 
the  construction  of  affine  incentive  strategies  (cf.  (39)).  let 
us  consider  the  variation  in  J2  resulting  from,  for  example, 
an  infinitesimal  variation  Su(  • )  in  u(  ■ ): 

&A  =  (V, ,/.(«.  v).  8u(  ■ ))  =  f  <t>’(t)8u{t)  dt 

J[) 

V 

or  =  E  <t>'(i)Su(i).  (46) 

/“(I 

Thus,  the  value  of  <t>{  )  at  time  t  simply  represents  the 
local  sensitivity  of  J,  with  respect  to  u{  i ),  in  other  words, 
the  ability  of  the  leader  to  influence  J:  by  changing  his 
decision  variable  u(  t )  at  time  t.  Likewise,  the  time  function 
'{'(  •)=  V,  J  (u.  r)  represents  the  follower’s  ability  to  in¬ 
fluence  his  cost  functional  J:  by  changing  r(  r ).  the  value  of 
e<  • )  at  time  /.  Hence,  they  can  be  referred  to  as  "sensitivity 
functions"  representing  the  sensitivity  of  7,  to  the  players’ 
actions,  which  may  be  taken  as  a  measure  of  the  plavers’ 
control  ability  in  the  related  optimization  problems. 

Of  course,  when  we  speak  of  “changing"  or  “influence" 
as  above,  we  use  these  terms  in  the  meaning  of  "infinitesi¬ 
mal  variations"  or  the  "first  order  approximation."  Thus, 
they  make  sense  only  in  a  small  neighborhood  of  a  specific 
point  (u.  r)  e  U  x  V. 

Hence,  in  the  absence  of  a  causality  restriction,  the 
results  of  Section  II  admit  an  explicit  "physical  interpreta¬ 
tion."  The  only  condition  for  existence  of  an  affine  incen¬ 
tive  solution  to  the  Stackelberg  dynamic  game  is  that  the 
sensitivity  of  /,  with  respect  to  w( ■ )  should  not  be  zero  (cf. 
Proposition  3.  and  also  (38)).  That  is.  whenever  the  leader 
is  able  to  influence  the  follower’s  cost- functional  ( infinitesi¬ 
mally ).  he  can  always  force  the  follower  to  choose  the 
prescribed  value  for  his  decision  variable. 

Now.  when  the  leader  is  faced  with  the  additional  con¬ 
straint  that  his  control  at  time  r  cannot  depend  on  the 
"future  values"  of  i  t  ■ ).  the  operator  (J  in  the  incentive 
strategy  (39)  should  be  a  causal  operator  (or  equivalentlv. 
Q*  satisfving  (4t)(  should  be  anti-causal).  Moreover,  if  the 
leader  needs  a  nonzero  time  duration  <  to  infer  the  neces- 
sarv  information  on  rtt  >  from  the  current  observation,  the 
control  utn  can  onlv  depend  on  the  value  of  mti  for 


t  <  (  -  e;  such  an  operator  Q  will  be  called  i-sirong  causal 
(and  £?*-«-strong  anticausal). 

Let  us  first  introduce 

«>(')  =  <*•*>„.  r.  «'<*•*>., .v,  (47) 

and 

< H')  =  <*.*>„.nOr<  +  .*>(,.Nl.  (48) 

If  <t>(/)  =  0.  i.e..  <f>(T)  =  0,  t  $=  i  (recall  that,  for  functions 
/(/),  gU)  in  Hilbert  space  /(■)  =  %(■)  means 

/(()  =  ,?(()  for  almost  all  i  e  [0,  T).  except  perhaps  on 
some  set  of  measure  zero;  this  fact  has  to  be  noted 
throughout  the  paper),  then  the  leader  cannot  control  the 
situation  during  t  e  [/.  T).  If.  concurrently.  i^(/)  *  0.  the 
follower  can  change  the  value  of  v)  by  infinitesimal 
variations  in  i’(  ).  Thus,  it  is  intuitively  evident  that  the 
leader  may  not  be  able  to  enforce  any  desired  decision  pair 
( uJ .  cJ )  by  a  causal  incentive  strategy,  because  he  cannot 
respond  effectively  to  the  variation  in  the  follower’s  deci¬ 
sion.  even  though  he  may  be  able  to  detect  it. 

To  put  the  above  intuitive  reasoning  into  precise  form, 
we  first  prove  for  continuous-time  systems  the  following 
resuit. 

Lemma  I:  For  any  <>(  - )  e  U  =  /,  ]  and  4m  • )  e  V 

=  £.'":[r0.  (,!•  a  set  of  sufficient  conditions  for  existence  of 
an  anticausal  bounded  linear  operator  Qm:  t  —  V  satisfy¬ 
ing  (40)  which  can  be  rewritten  as  (?*<>  =  9'.  is  the  follow¬ 
ing: 

a)  For  all  t  e  [0.  T).  'Pfr )  *  0  implies  <!>(()*().( Let 
be  the  smallest  time  such  that  d>( r0 )  =  0.  and  be  the 

smallest  time  making  i P(r)  =  0:  then  (he  condition  saw 
(*  is  M 

b)  When  r6  =  t+.  the  following  integral  exists  and 
remains  bounded 


(This  second  condition  means  that  the  follower’s  "control 
ability.”  measured  in  terms  of  'P(n.  cannot  be  much 
stronger  than  the  leader's,  at  the  point  =  i ^  when  thev 
concurrently  lose  their  control  ability.) 

Proof:  The  lemma  can  be  proved  simply  bv  giving  one 
of  the  possible  solutions  for  operator  (J*.  which  is 

I  /-4'H/)<>'(t)/(t) 

I  - r - di  ( t  ■  i .  ) 

<ri/<T)i</)=  a  ‘•’im 

I  0  I  r  •/,,) 

I  -'ll) 

Thus,  it  is  an  integral  operator  with  kernel 
I  'Ll  1 16  (  7  i 

I  .  (  r.  i  l . .  t  •  /  I 

Kl  t.  7 )  -  *1>l  t)  i'll 


(otherwise! 
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(see.  e.g.,  [29.  p.  67]).  Note  that 


|K||:  ^  [r  f '  Tr  [  R(  t.  t)  R'(t.  r  )]  drdt 

A)  A) 

=  ('•('•  Tr 

J,  J,  d >:(/) 


|  r  •♦ii M-Dg(l)dt 

•'o  ♦(!  +  «) 

I  0  ( 


t  <  t  or  <  ^ 


/*(  ♦'(£)♦(!)  /-I 

—  I  * - ; - -  I  V(  T)d>(  t)  Jrd/  The  counterpart  of  Lemma  1  for  the  discrete-time  case  is 

0  9*  ( r )  '  much  simpler,  since  Condition  b  is  then  implied  by  Condi- 

/•',  tion  a. 

~  J  <j)(f )  **  <  x  (^)  Lemma  2:  For  any  <>(  )  e  (/  =  /J1' [0.  ,V  -  1],  ♦(• )  e  L 

=  /";[0,  A/  -  1],  a  sufficient  condition  for  existence  of  a 
and  therefore  Q *  is  well-defined  and  bounded,  with  IK)*!!2  one-step-strong  anticausal  linear  operator  Q *:  U  -*  V  such 
$  ||/J||:.  It  is  anticausal.  since  the  value  of  Q* ( /( t )]  at  that  Q*<l>  =  '4'  is  that 

time  t  depends  only  on  the  values  of  /(r)  for  r  e  [r,  r#).  iii)  whenever  \p( k  )  =  £;V1'vl'/( /)'!'(/)  *  0.  we  must  have 
Finally  it  is  straightforward  to  verify  that  Q*\ <>(  •)](/)  =  <&{k  +  1)  —  E,V*‘*  )<>(' )  *  0;  that  is.  ^  +  l.1 

'('( t ).  except  perhaps  at  times  t  belonging  to  a  set  of  The  proof  of  this  lemma  is  similar  to  that  of  Lemma  1 
measure  zero.  and  is  therefore  omitted.  The  corresponding  linear  opera- 

Remark  6:  The  operator  Q,  being  the  adjoint  of  the  tors  are 
anticausal  operator  Q*.  is  a  causal  operator.  The  adjoint  of  ,  ,v  | 

(50)  can  readily  be  computed  to  be  Y  —  fit)  (k<i  -1) 

ei/<o]<*) i  *r  {  * 

0[«(O](t)  =  j  R'(i.  r)g(r)  dt  [  0 


di  <  x 


A)  v(t)  and 

The  lemma  can  easily  be  generalized  to  the  case  when 


I V  «>(,)*'(A)  {k 

t  )](/)-  A-.,  «f(*+  1) 


Q*.  L  -»  V  is  required  to  be  an  <-strong  anticausal  opera-  Q[%(  k  )](/)=  ■  1c  +  1 )  *' ^  ^ 

tor.  The  sufficient  conditions  become,  in  this  case  the  ) 

following:  1  0  (i-Oor/j*/,) 

l)  whenever  (♦.  r,  *  0,  we  must  have  <J>(r  +  <)  = 

(<(>.<»),,.,  r,  *  0-  that  is  >  r*  +  c.  This  implies  that  ’  ' 

'F(r)  =  0  for  all  t  T  -  «;  and  which  are  one-step  strong  anticausal  and  one-step  strong 

ii)  when  =  t^  +  t.  the  following  integral  exists  and  causal,  respectively.  The  general  conclusion  we  derive  from 
remains  bounded:  these  two  lemmas  is  the  following: 

Proposition  7:  For  the  general  Stackelberg  dynamic  game 
ru  <  'F'(  t  )Sk(  t )  problem  (Section  I-B  )  with  V  =  Z.7'[0.  T\.  V  =  L"'[0.  T] 

J„  <f >(/  +  *)  dt'  (54)  or  U  =  t?:< JO.  ,V  -  1],  V  -  IT  |0.  V  -  lj.  in  addition  to 

the  assumptions  l)-4)  made  in  this  section,  let  conditions 
In  this  case  the  kernel  corresponding  to  (51)  is  a)  and  b)  of  Lemma  1  (or  correspondingly,  condition  m  of 

Lemma  2)  be  satisfied,  and  let  the  leader  have  perfect 
j  t)  ,  ^  access  to  the  past  values  of  the  follower's  control  variable 

R  ( t ,  T )  —  <t>(r  +  t)  '  <  f'r  +  c<T<r*>  (by  possibly  inferring  these  values  perfectly  through  the 

|  0  ,  .  .  .  observation  of  the  state).  Then  the  operator  Q  defining  the 

affine  incentive  strategy 


/v,  <  ^  (  z  )^(  r ) 
l,  <t>(r  +  ()  '' 

In  this  case  the  kernel  corresponding  to  (51)  is 


I  *1 Ojlli 

R(t.  t)  =  <P(l  +() 


(otherwise). 


Moreover,  the  counterparts  of  (50)  and  (53).  in  this  case,  where 
are.  respectively. 


u  =  y,  (  r)  =  u*  -  Q\ r ■  -  r’ 


(  u*.  r’  1  =  argmin  ./,  (  it.  r  I. 


(3*1  M  r  )|  ( r  I 


tlrlo'lr) 
. ,  <!)(/-►•{) 


/(  t)  dr  ( t  <  /.j  -  t ) 
(otherwise) 


can  be  chosen  as  a  causal  operator  (correspondinglv  'lie- 
step  strong-causal  operator).  One  of  its  possible  Ion  is 


_  Here  f.?  jnd  r*  jre  ihe  d^erete  time  u'unterpjrtv  <*l  ihnsc  mir-Kliketl 

'  -  ”  *  ;n  I  emnia  l 
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given  by  (53)  (or.  correspondingly,  bv  (59)).  and  it  provides 
a  global  Stackelberg  solution  to  the  problem. 

Remark  For  the  discrete-time  case,  a  very  simple  and 
useful  version  of  the  sufficient  conditions  is  that  at  the  last 
decision  stage 

<M  V  )  *  0  and  <H  V  )  =  0. 

The  causal  incentive  solution  obtained  above  offers  us  a 
possible  way  of  constructing  the  “closed-loop"  solution  to 
the  Stackelberg  dynamic  game  problem,  provided  that  r(  ■ ) 
can  be  reconstructed  from  the  observed  state  information 
in  real-time.  If.  for  example,  there  is  a  causal  operator  H 
such  that 

«'(■)  =  H.x()  (62) 

then  we  have  the  closed-loop  solution 

«(  )  =  a+(  • )  -  Q[//.x(  - )  -  !•♦(■)]  (63) 

which  is  physically  realizable.  More  specific  derivations 
along  this  line  are  provided  in  the  next  section. 

We  should  note  that  when  the  <-strong  causality  condi¬ 
tions  i)  and  ii)  are  taken  instead  of  a)  and  b).  the  statement 
of  Proposition  6  can  be  modified  in  a  straightforward 
manner,  which  then  says  that  affine  /-strong  causal  solu¬ 
tions  exist.  These  may  be  used  in  realizing  the  optimum 
Stackelberg  strategy  of  the  leadet.  with  an  /-delay  in  the 
reconstruction  of  r(  • )  from  the  state  observation  a(  ■ ). 

Finally,  we  should  remark  that  the  results  of  this  section, 
in  particular  those  of  Lemmas  1.  2.  and  Proposition  7.  can 
be  extended  to  the  case  when  the  leader  has  only  partial 
state  information  and/or  partial  dynamic  information  on 
the  follower's  actions,  without  much  difficulty  and  with 
only  minor  modifications.  This  extension  involves,  basi¬ 
cally.  the  derivation  of  an  achievable  desirable  solution 
( u*.  rr )  to  replace  (61 1  (cf.  Section  ll-B).  "projected"  cost 
functional  Jau.  r)  for  the  follower,  and  rewording  of 
Lemmas  1  2  and  Proposition  7  in  terms  of  this  new 
notation.  We  do  not  pursue  this  point  here;  see.  however, 
the  specific  problem  solved  in  Section  V-B. 

v.  Applications  and  Examples 

In  this  section,  the  concepts  and  results  presented  in 
Sections  II  and  IV  will  be  applied  to  some  special  cases  of 
practical  interest.  Some  numerical  examples  will  be  given 
to  show  the  applicability  of  the  theory  and  the  general 
approach. 

I  Causal  Stackelheri ;  Solution  to  Discrete-Time  Linear 
Quadratic  Dynamic  (lame  Problems 

One  of  the  important  subc’asses  of  problems  widelv 
discussed  in  the  literature  (see  e.g..  [9|,  (27] )  is  (he  discrete¬ 
time  dvnamic  Stackelberg  game  with  linear-state  equation 

x  (  A  -  1)  =  -I  (  A  )v(  k  )  c  B(k  )i/(A  )  -  CIA  )/<  A  ) 

(A  0.1.  •  .  V  |  )  (64) 


and  quadratic  cost-functionals 

V  1 

J  =  .<'(  .V  )4>(  .V  )  A  (  .V  )  -r  £  {  A  (  /  )Q,[  J  ).V(  /  ) 

-<-«'(  j)R,(  l ) u{  j  )  -  r'(  j  )S,(  j  )c(  j  )  j  (65) 

where  /  =  1.2.  refer  to  the  leader  and  the  follower,  respec¬ 
tively. 

The  approach  presented  in  the  previous  section  can  be 
used  in  obtaining  a  causal  solution  to  this  problem  under 
the  closed-loop  information  pattern.  Here  we  give  onlv  a 
numerical  example  to  illustrate  the  method. 

Example  1 : 

a  (  A  +  1 )  =  a (  A  )  +  u(k)  +  e (  A  ) 

(A  =  0.  1 .  .  \  -  2) 

■V  (  V )  =  A  (  V  -  1 )  -  rH  v  -  1 ) 

\  : 

7,  =  a:(.V  )  -  V  (  v - (  A  )  *  2i/:(  A  )  -  /--(  A  )) 

\  i 

J:  =  v:(  V  )  +  Y.  I  v:(  A  I  -  tri  A  )  -  3r : I  A  I). 

t  =" 

In  accordance  with  the  method  presented  in  Sections  II 
and  IV.  first  the  team  solution  of  minimizing./  is  obtained 
from  the  standard  Riecau  recurrence  relations,  which  in¬ 
volves  the  value  function  v'(  A  )P(  k  ).vi  A  ).  the  corre¬ 
sponding  team  optimal  controls  </ ;  <  A  )  and  CiAi.  and 
optimal  trajectory  v'(  ).  Then  the  gradients  oi  A  i  = 
V,,, and  T ( A  )  =  v  ,.  ,./•  at  the  desired  team  solution 
are  derived  from  the  dynamic  equations  as 

V 

<>(  A  )  =  Y  / )  *  «'(  a  )  (A  =  0.  1.  ' .  \  Ii 

i  -  k  •  1 
\ 

'HA  )  =  Y  'M  J  >  -  3r*(  A  I  (  A  =  0.  1.  .  \  21. 

»  •=  t  •  i 

C<  \  1 1  -  ii. 

Note  that  ‘t>(  A  )  =  !L,'a ; ( i ).  and  from  (60)  and  i5k)i  ihc 
optimal  Stackelberg  strategv  for  the  leader  is 


1/(01  ii  ( o  | 

.  .  T  (  A  i 

=  /'  ( /  )  \  ( i  I  *-  •.“>  ( /  I  '  , 

,  i|M  A  •  II 

[  \  (  A  -  1)  vlA)  hI  A  I  -  (  A  1 1  . 

i/(0)  i/OO 

I  he  values  of  these  coeflictenis  lot  the  x.isc  \  4  ,ue 
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TABLE  1 


0  14  J 


p(Jt) 

l  *.57b075 

1  4592593 

4  i.K)  X  f.k) 

-0.2288037 

-0  2296296 

v  ik)/:i  (k) 

-0.457607* 

-0  4592593 

x  ( ic )  <  <  k- 1 ) 

0  3105883 

x  ( k )  <  ( o  • 

i 

0.  3135888 

» ( k ) . x lo 1 

0.2238037 

0.0720093 

*  <  < '  Xi Jl 

-o.  ■nsius 

-0.2880372 

:<k)  x * x o > 

0. 058162« 

0.0058112 

l  -.76 1906 

1  bbbb60: 

1 . 000000 

-0.2360953 

-0  3333333 

— 

-0  4761905 

0 

— 

0  3111111 

0  2857143 

0  6 boo  156  7 

0  097561 

0  027874i> 

3  013583 

0.0.32238 

0.00929L5 

— 

-00929152 

0 

— 

0.006259 

0.  000(736  J 

- 

listed  in  Table  l.  with  the  corresponding  policies  being 
wll)  =  m^I)  -  llJ40855(c(0)  -  t-'IO)) 
u(2)  =  Mf(2)  -  3.6583393 

•  (  c(0)  -  cf(0)  -  1 ,068982(t  (l)  -  tf ( 1 ) ) 
m( 3)  =  mN3)  -  1.4633326 

•  (  ft 0)  -  t'*(0)  -  0.42759191  r(l)  -  c*(l)) 

-  10.003726(t-|2)  -  t-t|2)). 

Here 

t'(0)  -  t*|0)  =  x ( 1 )  -  .v ( 0 )  -  m*(0)  -  C(0) 

'  ll)  -  f*(l)  =  x| 2)  -  x(l)  -  u* { 1 )  -  cMD 

t  (2)  -  p*(2)  =  -v 1 3 )  -  .x ( 2 )  -  «t(2)  -  ('M2). 

This  is  a  causal  closed-loop  Stackelberg  solution  which 

achieves  the  globally  optimal  team  solution. 

B  The  Linear  Quadratic  Infinite- Time  Stackelberg  Problem 
Consider  the  continuous-time  problem  formulated  by 

x  =  Ax  +  Bu  +  Cv  .x(/)eR".  u(r)eR'”'. 

fit)  e  R""  x(0)  =  .x„.  t  >  0:  (66) 

/  =  J  (  x'Q,x  +  u'R.u  +  c'S,i')  dt  ( i  =  1.2)  (67) 

where  B  and  C  have  full-column  rank  m,  and  m,.  respec¬ 
tively.  (,4.(fl:C))  is  controllable.  {Q\  \  4)  is  observable. 
Q,  >  0.  R  |  >  0.  5,  >  0.  R,  >  0.  The  team  solution  that 
minimizes  T,  is 

u  =  'B’Px*.  r*  =  -.S',  'CP x\  J{  =  x;,Px„ 

(68) 

where  P  is  the  unique  positive  definite  solution  of  the 


algebraic  Raccati  equation 

P[BRi  xB‘  +  CS,  lC’\P  ~  A’P  -  Pi  ~  Qi  =  0  (69) 

and  the  optimal  trajectory  .x*  satisfies 

x*  =  Acx'  =  [a  -  (  BRt  'B'  +  CS,  'C)P\x\ 

.x  ’ «) )  —  ,v , , .  (70) 

We  now  attempt  to  solve  this  problem  under  two  differ¬ 
ent  causal-functional  dependences  for  the  leader's  policy, 
viz  y,:  V  —  U;  y,(r)  =  u*  -  Q(r  -  r* )  and  y,.  A  —  L  ; 

y,(.x)  =  uf  -  Q(  x  -  .xf).  where  Q  is.  in  each  case,  a  linear 

causal  operator. 

In  the  former  case,  we  first  calculate  the  gradients  of  J , 
with  respect  to  u  and  r  (see  Appendix  A)  and  arrive  ai 

<M  / )  =  V„y:(«t.cM  =  2A/.xt(  r )  ( 7I  I 

'k(t)  =  A,J:(ut.f'  )  =  2.Vxt(  t )  ( 72) 

where 

.V/  =  (B7„-  R-R^B  P)  (73) 

.V  =  ( r/„  -  S,S,  XCP  )  (74) 

and  /„  is  the  solution  of  the  matrix  equation 

T/,1  -v  +  Q:  =  0.  (  75 ) 

If  a  constant  matrix  gain  solution  is  desired,  then  we 
must  have 

Q’M  =  V-  I'M 

Unless  Range  SI’  ^  Range  V".  such  a  Q  does  not  exist, 
and  hence  the  problem  does  not  admit  a  volution.  How¬ 
ever.  if  we  also  allow  dependence  on  the  initial  state  v,  . 
affine  causal  solutions  exist  provided  that  lor  all  \ »■ 

Me  *  x„  *  0.  which  is  equivalent  lo  the  requirement  that 
(  M.  A  )  be  observable.  In  this  ease  an  optimal  alfine 
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incentive  scheme  is 
u(t)  =  -  Q(v  -  cf) 

Ua.  (77) 

0  /  <J»'(i)<fr(i)  zis 

A) 

Next,  we  seek  a  solution  in  the  form  u  =  uf  -  Q(x  - 
xf ),  where  0  is  causal.  By  using  the  approach  outlined  in 
Section  II-B  and  taking  the  entire  trajectory  x  as  the 
leader’s  observed  information,  we  have,  uniquely, 

t?*(u,  x )  =  C*(x  -  Ax  -  Bu)  (78) 

where  C"=  (C'C)'‘C'  is  the  pseudo-inverse  of  C.  Note 
that  in  this  case  the  absolute  lower  bound  given  by  (68)  is 
attainable,  since  the  operator  S  of  Section  II-B  is  invertible 
(C  being  a  matrix  of  full-column  rank).  Now,  projecting 
the  problem  into  U  x  X  where  (u.  x)  belongs,  we  obtain 

J2(u.  x)  =  (x.  Q:x )  +  ( u .  R2x) 

+  ((.x  -  Ax  -  Bu),  C(.i  -  /lx  -  Bu)) 

(79) 

where  C  =  C"S2C~.  The  gradients  vu7;  and  v,/> 

( u\  x* )  can  be  evaluated  as  (see  Appendix  B) 

<f>(r)  =  VuJ2(uf,  x*)  =  2Mx*(t)  (80) 

♦  (f)  -  ?,/.(«♦,.«♦)  =  2Sx'(t)  (81) 

where 

V/  =  (B'C~S2S/'C  -  R2R^B-)P  (82) 

.V  =  Q2  +  C*'l  al~'C’PAl  +  A'C~'S2S/lC'P.  (83) 

The  conclusion  we  arrive  at  here  is  almost  the  same  as  in 
the  case  (73)— (74).  When  Range  M'  3  Range  S',  there  ex¬ 
ists  a  constant  gain  solution  u  -  -  Q(x  -  x*)  with  Q 

satisfying  Q'M  =  N.  Otherwise,  provided  that  ( M.  /!,.)  is 
observable,  there  exists  an  affine  causal  solution,  depend¬ 
ing  on  x(|  *  0.  given  by 

ud)  =  u*  -  Q(x  -  x* ) 

fi  <>(  t  )♦'(  a )( x(o )  -  x*(  a  )) 

=  u*(t)  -  /  - Tor- - —  do. 

I  ds 

*  rt 


since  from  2P:  -  4P  -  6  =  0  we  have  P  =  3  and  A  = 
-4.  From  (75),  /,,  =  (l/2)i /.  From  (73)-(74).  .V/  =  (\/2)q. 

/V  =  (1/2 )q  -  3 r  That  is,  <£(/)  =  qxUt ).  ♦(/)  =  (</- 
6r  )xf(i).  The  optimal  Stackelberg  strategy  is 

m( r )  =  1/(1)  -  Q[v  -  i’M(D 
where  the  operator  Q  is  either  Q  =  S/M  =  1  -  6 r/q  or 

Q[«(0](t)  =  <J>(t) j  -jj--S—~dt 
f  <P2(s)ds 
J  [ 

J.  6r  \  rJ  . . 


=  81'7l 


This  solution  can  be  implemented  by  a  first-order  block 
with  transfer  function 

i?U)-^  — ^  j-  v ^  =  si  1  -  )  — ~t  : 

V(s)-VUs)  '  tf)  si- 4 

where  s  is  the  Laplace  variable. 

On  the  other  hand,  from  (82)  and  (83) 

M  =  3r/2.  .V  =  (q  -  6r)/2. 

Thus,  the  optimum  Stackelberg  strategy  in  case  of  .t-depen- 
dence  is 

«(/)  =  «f(i)  -  Q[x  -  xMO). 
where  the  operator  Q  is  either  Q  =  S/M  =  q/ 3r  -  2  or 

(?[.?<!>](t)  =  2>re  "*Tx„/  -q-  -?r\- — — —  ■  Jt 
J"  I  9 r-xrte  *'ds 
J I 


=  f[yr- 


where  <>(  •)  and  ♦(  •)  are  given  by  (80)  and  (81),  respec¬ 
tively. 

We  now  provide  a  numerical  example  to  illustrate  these 
results. 

E  xample  2: 

x  =  2x  +  u  +  v.  x(0)  =  x„  t  e  [0.  x ) 

7,  =  /  (6x;  +  i- )  dt 

A) 

J:  =  /  X( qxz  -v  rr: )  7l.  q  >  0.  r  >  0. 

M) 

The  team  solution  is 

i/  =  -3x\  r'  =  -3xT.  v‘(i)  =  c  4'.t„. 


which  may  be  implemented  in  the  frequency  domain  b\ 

?)  =  C  (.v_)_-_C  *(s)  =  S(  f  -  2 )  — — ■ . 

,V(  s)  -  .Y  (  v )  >  '  *  4 

C.  The  Linear  Quadn’tic  Finite-Time  Closed-Loop 
Stackelberg  Problem 

In  this  subsection  we  provide  an  example  illustrating  the 
results  obtained  in  Section  IV  when  /0  •  x. 

Example  J: 

Consider  the  problem  with  the  specifics 
x  =  2x  +  ti  +  b(  t  )r.  it  [o.  1 ). 

./,  =  4  x '  ( 1 )  -  j1  (bx:  -  ir  -  -  ')./!. 

,  1 

./.  =  2.x '( 1  )  *•  j  (  q\  ’  -  re  I  ,/>  l  „•  u.  /  ■  >M 

where  the  time-varvine  gam  hi  n  is  .>  v.ontiiHn<us  K-unded 
function;  furthermore,  when  I  •  ) .  J’i  /  >  •"wiilnlic  "dci 

of  magnitude  being 

/>(/)=  />.,«"  -  of «" <  1  i.  o  ■  ■' 
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The  team  solution  for  /,  is 
uT(!)  =  -2  P{t)x'(i).  r*  =  -2  />(/)/»(  /  )aM  / ) 

vr  =  (2-2(1  +  b):)/V  =»  vMt) 


exp  I  -  f  2P(  b:  +  2b  )  dt  I 


UP 

dt 


+  4P  -  2P:(\  +  b1)  +  3  =  0.  P(l)  =  2. 


Both  a  t )  and  P(  t )  are  bounded  continuous  functions  on 
[0. 1). 

Vu7;f  =  <f>(T)  =  2 qfle:',-,,x'(t)dt  +  4xf<  1  )e:“  T» 

*T 

V,V;  =  ¥(t)  =  2 qb{T)jle'-'‘  "x\t)dt 

+  4xf(  1  )e~a  T ’/? (  r )  +  2rrt(r). 

When  r  -*  1 ,  <>(  t )  — *  <><  1  >  =  4.*+(l),  thus  »  1.  Further¬ 
more, 

4 Mr)  =  ('<pHt)Jt  =  4xf(l)(l  -  /)  *0(1  -  /). 

Therefore,  condition  (49)  is  satisfied: 

r*+:(T) 


ri  y 

I  ♦(T) 


dr  <  x 


Appendix  A 
Derivation  of  (7l)-(72): 

Since  x(f)  =  e^'x,,  +  T’(  flu(  t  )  +■  G<  t  )  dt. 

(Sx,  Q2x  > 

=  (  Sx'(r)Q,x(r)  dr 
Jn 

=  j\8v'(r)C'  +  Su'(T)fl')^‘'  "£>,x(/)dTdr 

=  (*7t/X7/|5i''(t)C'  +  *M'(T)fl']e  "Q,x(  r ) 

■'ll  A 

Therefore,  for  variations  Su  and  Sv  we  have 

^6/.  =  (Sx,  Q2x)  +  ($“'  &:")  +  (ti'*  -S':4'/ 

=  (Su.  R2u  +  f  B‘eAu  Tl(?:x(r)  dt) 

•'t 

+  (8v.S,v+  f  C'eAU  ,]Q2x[t)dt) 

*  T 

iv„7:  =  /?:«+  f* B'eAu  "Q,x{i)dt 

*■  ■'t 

|v,7:  =  S,e  +  j*  C'eAu  "Q:x{t)dt. 

When  u  =  u\  c  =  i’\ 


and  by  Lemma  1.  the  operator 

•'n 


4><r) 


,?( t )  dr 


1 


^-<M  r )  =  j  vud:(ut.  t’M 

=  R,u4  +  5V  j7f  f*e4'Q:e4  ’  dt 


is  linear,  bounded,  and  can  be  used  in  the  construction  of 
the  Stackelberg  strategy  u  -  u*  —  Q{x  —  x* ). 

VI.  Concluding  Remarks 

In  this  paper  we  have  discussed  derivation  of  closed-loop 
Stackelberg  strategies  and  incentive  policies  for  a  general 
class  of  dynamic  decision  problems  with  a  hierarchical 
decision  structure,  in  both  discrete  and  continuous  time. 
The  first  set  of  results  involve  discrete-time  dynamic  games 
in  which  the  leader  has  informational  advantage  over  the 
follower,  in  the  sense  that  he  can  observe  the  follower’s 
actions  at  each  stage  (before  he  acts)  either  perfectly  or 
partially.  Under  a  feedback  Stackelberg  solution  concept 
that  takes  this  informational  advantage  into  account,  we 
have  studied  derivation  of  optimal  affine  policies.  Further¬ 
more.  we  have  investigated  the  conditions  under  which 
such  a  solution  coincides  with  the  global  Stackelberg  solu¬ 
tion  (cf.  Section  III), 

A  second  set  of  results  presented  in  this  paper  has 
involved  an  analysis  of  existence  and  derivation  of  causal 
real-time  implementable  global  Stackelberg  solutions  in 
dynamic  games  wherein  the  leader  is  allowed  to  use  mem¬ 
ory  policies.  In  this  context,  we  have  treated  both  discrete¬ 
time  and  continuous-time  problems,  and  using  a  function 
space  approach  we  have  solved  certain  special  cases  both 
analytically  and  numerically  (Sections  II.  IV.  and  V). 


-*(t)  =  -  V,7:(«,.ft) 


f*e4'Q:e4‘ 


dt 


Since 


/T  =  fXe1'Q1e4'dt 

•'t 

=  e  V  'A,  '|f  -  JX A'e4,Qze4  '.4,  '  dt 
=  A’  le4,Q:e<-'\?  -  f  A'  'e4rQ:e4  'A,  dt. 


it  follows  that 

Let  /,,  satisfy 
then 

Therefore. 

1 


A  7.  +  /.  A  +  e4'Q4'  =  0. 
■T/t.  +  l„A  *  Q:  —  0; 

/.  =  e  4  7„e  '  ’ 


^<r>  =  -fl.K,  'B'PxAt)  -  B  x  f  / ) 

=  -N..V;  '(  /Vlr)  -  (  7..\(r) 
and  relations  t-.'1)  and  i’4)  follow. 
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Appendix  B 

Derivation  of  Gradients  (80)-(8I):  Note  that 
J2(  u.  x )  =  (x.  Q2x)  +  (u.  R:u ) 

+  {( x  -  Ax  -  Bu).  C{x  -  Ax  -  Bu)). 


=  (<?:  +  ‘C  PA, 

+  A’C'S2S ,  lC"P)xfii)-Nit)xUt). 

This  then  completes  the  verification. 


and  consider  only  those  x  and  u  with  their  values  and 
variations  <5  v  and  8u  satisfying 

.c(oo)  =  x(x)  =  0 

5.x  (0)  =  6x(oo)  =  S.t(oo)  =  0 

u(  oo)  =  Su(x)  =  0. 

We  have,  for  variations  8x  and  8u: 

8(x.Cx)  =  2  (Sx.Cx)  =  2  fX8x(t)'Cx(t)  Jr 

A> 

=  2SVC.t|J  -  2  Sx'Cx  dt 
A) 

=  -2fX8xCxdt 
Ai 

Vx(x.Cx)  =  -2Cx 

S(x.CBu)  =  f  Sx'CBudt  +  f  x'CBSudt 
A)  Ai 

=  f  x'CBSudt  +  8x'CBu\*  ~  f  Sx'CBudt 
At  Ai 

V,(.i.  CBu)  =  -CBii 

Vu<.t.  CBu )  =  fl'C.v. 

Therefore. 

^  V„A:  =  R2u  -  B'Cx  +  B'CAx  +  B'CBu 
=  Rzu-  B'CCv 

\vxJ2  =  (?:.v  -  C.x  +  ,4'C/l.v  +  C.-Lv 

-  A'Cx  +  CBU  +  A'CBu 
=  Q:x  -  CCi-  -  A'CCv 
=  <?,.t  -  C*'S;r  -  A'C"S:r. 

At  point  ( it*.  x f ).  these  expressions  become  equal  to 

=  \vj:(u*.  x*) 

=■  ( -  R2R{  lB'Px*  +  B'CCS ,  'C7V)(f) 

=  (  B'CCS,  lC'  -  R:Rt  [B')  Pxf{  t ) 

=  M{t  )x*{t) 

^'Hf )  =  j  V,  A:(  w\  x* ) 

=  (Q: x*  +  C  ‘  V  .S,  'C/V 

^.-f'C'SsS,  lr/,.rt)(/) 

=  I  Q ; .V 1  -  C'S:S,  'C7M,  v* 

-.■rc's;s,  lr/>.vt)(M 
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Abstract 

In  this  study  we  introduce  a  general  definition  of  an  equilibrium 
concept  (called  "strong  equilibrium")  for  both  discrete  and  continuous  time 
dynamic  games  and  under  varying  (symmetrical  and  asymmetrical)  modes  of  play. 

The  underlying  system  is  stochastic,  with  structural  and  modal  uncertainties 
determined  by  a  finite  state  jump  process.  The  new  equilibrium  concept 
encompasses  both  the  feedback  Nash  and  feedback  Stackelberg  solution  concepts 
for  the  special  cases  of  deterministic  discrete-time  games  with  symmetrical 
and  asymmetrical  modes  of  play,  respectively,  and  it  also  provides  a 
convenient  framework  for  the  introduction  of  a  feedback  Stackelberg  solution 
concept  in  deterministic  differential  games.  For  the  general  class  of  stochastic 
nonzero-sum  games  with  structural  and  modal  uncertainties ,  and  under  the  feedback 
closed-loop  information,  we  obtain  the  optimality  conditions  in  both  discrete 
and  continuous  time.  Certain  special  cases  are  also  studied,  and  the 
intrinsic  relationship  between  information  patterns  and  possible  definitions 
of  value  in  nonzero-sum  differential  games  is  clarified. 
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.1 .  Introduction 

Stackelberg  solution  concepts  arise  in  games  with  asymmetrical 
modes  of  play,  with  one  of  the  players,  called  the  leader,  having  the 
ability  and  power  to  announce  his  move  (or  policy)  first,  leaving  to  the 
other  players  the  possibility  to  react. 

This  solution  concept,  first  introduced  by  Von  Stackelberg  [1] 
in  the  realm  of  the  economic  theory  of  imperfect  competition,  attracted 
the  attention  of  control  theorists  concerned  with  hierarchical  systems, 
with  the  first  set  of  related  results  documented  in  the  works  of  Chen  and 
Cruz  [11],  Simaan  and  Cruz  [2a,  2b],  and  Castanon  [12].  It  was  soon 
discovered  that  the  derivation  of  the  closed-loop  Stackelberg  solution 
(corresponding  to  the  case  when  all  players  acquire  closed-loop  state 
information)  involved  an  extremely  challenging  (nonclassical)  class  of 
optimization  problems,  and  that  it  would  generally  not  lend  itself  to  a 
dynamic  programming  approach — the  absence  of  "tenet  of  transition"  pre¬ 
cluding  the  possibility  of  applying  the  standard  backward  induction 
procedure.  For  an  up-to-date  account  of  these  aspects  of  the  Stackelberg 
problem,  and  in  view  of  the  recent  developments  on  the  solvability  of 
the  closed-loop  Stackelberg  game  by  also  allowing  for  memory  policies, 
we  refer  the  reader  to  [3],  and  also  to  [13]. 

As  an  alternative  to  the  closed-loop  Stackelberg  solution, 
Simaan  and  Cruz  introduced  in  [2]  the  concept  of  feedback  Stackelberg 
solution  where  the  leadership  is  defined  and  implemented  sequentially, 
in  the  spirit  of  the  dynamic  programming  approach.  In  other  words,  the 
feedback  Stackelberg  solution  is  defined  recursively,  by  solving  a 
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static  Stackelberg  game  at  every  stage  of  the  decision  process  and  by 
proceeding  in  retrograde  time.  It  was  defined  only  for  discrete-time 
systems,  and  the  concept  was  hastily  discredited  when  it  appeared  that 
the  leader,  in  such  a  game,  was  not  assured  of  a  better  payoff  than  what 
he  could  obtain  if  he  were  playing  according  to  the  Nash  equilibrium 
solution  concept. 

Our  objective,  here,  is  to  provide  a  new  interpretation  of  the 

feedback  Stackelberg  solution,  which  permits  us  to  relate  the  Nash 

equilibrium,  the  closed-loop  Stackelberg  solution  and  the  feedback 
Stackelberg  solution  as  various  manifestations  of  a  central  concept  in 
dynamic  games,  which  we  call  "strong  (feedback)  equilibrium".  Using  this 
new  "unified"  concept,  and  adopting  an  approach  similar  to  that  used  by 
Friedman  [10]  for  the  definition  of  differential  games  (i.e.  continuous¬ 
time  dynamic  games) ,  we  will  then  be  able  to  extend  the  concept  of  a 

feedback  Stackelberg  solution  to  the  case  of  systems  described  by  ordinary 

differential  equations.  Furthermore,  the  leadership  is  often  a  changing 
"gift",  and  hence  it  may  rotate  between  different  players  according 
to  some  pre-chosen  deterministic  or  random  rule.  Using  our  new  extended 
solution  concept,  we  will  be  able  to  model  situations  where  the  mode  of 
play  is  randomly  evolving  in  time,  possibly  dependent  on  the  past  moves 
(actions)  of  the  players  as  observed  through  the  current  value  of  the  state. 

In  section  2  the  so-called  "single-act  Stackelberg  game"  is  de¬ 
scribed , using  the  fundamental  concept  of  an  extensive  form  of  a  two- 
person  game.  It  is  then  shown  that  the  Stackelberg  solution  concept  is  in 
fact  an  equilibrium  solution  (so-called  strong  equilibrium)  associated 
with  a  peculiar  information  structure.  This  "single-act  game”  incor¬ 
porates  all  the  essential  ingredients  of  the  most  general  structure  for 
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a  two-person  game:  First  a  chance  player  acts  and  decides  on  the  system 
to  be  controlled  and  the  mode  of  play  (i.e.  whether  or  not  there  wilJ 
be  asymmetry  in  the  information  pattern,  and  in  case  of  an  asymmetry  who 
will  be  the  leader);  then,  the  game  is  played  according  to  the  rules  set 
by  the  chance  move. 

In  section  3  this  basic  structure,  as  well  as  the  concept  of 
strong  equilibrium,  is  extended  to  a  multi-stage  decision  framework, 
where  the  players  have  access  to  the  current  value  of  the  state  of  the 
dynamic  system  and  the  outcome  of  the  finite-state  jump  process  which 
characterizes  the  underlying  dynamic  system  and  the  current  mode  of  play. 
For  this  class  of  stochastic  dynamic  games,  we  obtain  a  set  of  recursive 
equations  which  completely  characterizes  the  strong  equilibrium  solution. 
Two  special  cases  of  this,  corresponding  to  the  fixed  asymmetric  and 
symmetric  modes  of  play,  are  the  feedback  Stackelberg  and  the  feedback 
Nash  solutions,  respectively. 

In  Section  4  the  set-up  of  Section  3  is  extended  to  the 
continuous  time.  Here,  we  consider  the  class  of  two-person  differential 
games  in  which  the  mode  of  play  and  the  structure  of  the  underlying  system 
are  determined,  at  each  point  in  time,  as  the  outcome  of  a  finite  state 
jump  process  evolving  in  continuous  time  and  depending  on  the 
current  value  of  the  state.  Both  players  observe  this  outcome  and 
the  current  value  of  the  state,  and  they  play  the  specific  game 
chosen  by  the  chance  mechanism,  using  feedback  control  laws.  Of 
course,  in  doing  this,  the  players  have  to  anticipate  the  future 
moves  and  possible  realizations  of  the  jump  process  that  determine 
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the  future  rules  of  the  game.  For  such  a  stochastic  differential  game, 
it  is  not  possible  to  introduce  the  concept  of  an  equilibrium  directly; 
however,  by  introducing  a  sequence  of  G(5)-games  which  are  discretized 
(in  time)  versions  of  the  original  dif t eiential  game,  and  adopting  a 
generalized  definition  of  a  s'  .  tegy  on  "supergames"  "a  la  Friedman" 

[10],  we  are  able  to  provide-:  a  definition  of  what  we  call  a  "strong 
equilibrium"  (as  opposed  to  "-e.u  equilibrium"  which  is  also  elucidated  in  the 
text).  For  the  special  case  of  deterministic  differential  games  with  a 
fixed  asymmetric  mode  of  play,  this  new  concept  provides  a  natural 
counterpart  of  the  discrete-time  feedback  Stackelberg  solution. 

After  introducing  the  general  "strong  equilibrium"  solution, 
we  also  derive  in  section  A  the  Hamilton-Jacobi  equation  associated  with 
the  optimal  feedback  solutions.  Then,  in  section  5,  we  treat  some 
special  cases,  viz.  the  purely  deterministic  differential  game  with  a 
fixed  asymmetric  mode  of  play,  and  the  linear-quadratic  differential 
game  in  which  the  jump  process  determines  (independent  of  the  current 
value  of  the  state)  only  the  mode  of  the  play.  For  the  former  case  we 
show  that,  in  general,  the  feedback  Stackelberg  solution  is  different 
from  the  feedback  Nash  equilibrium  solution,  and  in  this  context  we 
clarify  and  extend  a  result  obtained  in  [10]  with  regard  to  limit  points 
of  equilibria  of  G( 5)-games  under  asymmetric  modes  of  play. 

Section  6  includes  some  discussions  and  concluding  remarks, 
and  the  two  appendices  provide  derivations  of  some  of  the  results  used 
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in  Che  main  body. 
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2 .  Strong  and  Weak  Equilibrium  Solutions,  and  the 

Relationship  Between  Stackelberg  ^nT~Nash  Equilibria 

“k 

Let  U,  c  3R  be  a  measurable  space  to  which  the  control  variable 
k  — 

of  player  k  (Pk)  belongs  (k=l,2).  Let  J^Cu^u^),  ^k:UlxU2_’  ®  >  ^e  a 
real-valued  function  denoting  the  cost  functional  of  Pk.  Stipulating  an 
asymmetry  in  the  roles  of  the  players,  let  PI  be  the  leader,  announcing 
his  control  (constant  policy)  u°£U^  first,  to  which  P2  reacts  optimally  by 
minimizing  J2(u°,u2)  over  u^Sl^.  ^et  us  assume  c^at  the  reaction  set 


R2fu")  =  £vi^€ u2  s  J2^ul»u2^  “  min  J2^ulsU2^ 


(2.1) 


U2SU2 


is  a  singleton,  so  that  there  exists  a  unique  mapping  T2 : U ^  -  U2  with  the 
property 


J2  (u(,^2  <Ul) )  =  min  j2(ul»u2^5  ?ui  •  (2-2) 

U2*U2 

"is 

Then,  we  call  a  pair  (u^,u2)  x  U2  a  Stackelberg  solution  [1,2,3]  for 

the  static  game,  with  PI  as  the  leader,  if 


Uj_  =  arg  min  JL  (UL>T2  <“].) ) 


U1*U1 


<T 

u2  "  VU1>* 


(2.3) 


A  Stackelberg  solution  with  P2  as  the  leader  can  be  defined  analogously, 
by  interchanging  the  roles  of  the  players.  Furthermore,  a  pair 

(U1’U2} 

in  called  a  Nash  equilibrium  solution  [3,4]  for  a  static  game  in  which  the 
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ro1  of  the  plavers  are  symmetric,  if  it  satisfies  the  pair  of 
inequalities 

Jl(ul’u2)  ^  J1(u1’u2)  Vul€Ul  (2,^a) 

J2{ul*u5)  <  J2(U1,U2)  Yu2€U2  '  (2'4b) 

~  i r  ^  ’namic  games  Che  same  definitions  of  St?  xelberg  and  Nash 
eq>  jo  ,on.  apply,  provided  chat  we  have  a  normal  form  description 

of  t  *  [3],  in  which  case  u^  and  will  have  tc  be  interpreted 

the  strategy  and  strategy  space,  respectively,  of  Pk.  Moreover,  ste 
Stackelberg  games  can  be  viewed  as  special  types  of  dynamic  Nash  garni  * 
wherein  the  Stackelberg  solut  on  concept  coincides  with  a  particular  type 
of  Nash  equilibrium  solution,  as  to  be  elucidated  in  the  sequel. 

Consider  the  static  two  person  game  { J^,U^ ;k*l,2 }  introduced 
earlier,  with  PI  acting  as  the  leader.  Introduce  a  2-stage  nonzero  sum 
single  act  dynamic  game  {Jk,l/k;k=l,2 ]  whose  extensive  form  description  is 
as  follows: 

State  equations:  ,  ,  , 

X2  =  (ul>°m2)  •  <2-5a> 

X3  =  <2-5b> 

Strategies :  Constant  mapping  y^:  -U^  for  PI  [i.e.,*.^  =  U ^ ]  ; 

measurable  mapping  y2(x2),  y2:  1R m  -  U2 

for  P2,  with  the  corresponding  space  denoted 
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Cost  functions:  =  Jk(x2»x3)’  w^ere 


x2  ^  x2 


(2.6) 


x_  A  [0  I  ]  x_ 

3  =  mjXn^,  ®2  3 


nu 


Now  note  that  to  each  y2^2  there  corresponds  a  unique  ^ :  3R  -  U2> 

satisfying  Y2^x2')  3  Y2(x2)>  S°  that  Jk^Ul’Y2  “  Jk^Ul  ,Y2  ^  ‘  Hence» 

if  (Ul,u  )  |  x  U2  is  a  Stackelberg  solution  with  PI  as  the  leader, 

relations  (2.2)  and  (2.3)  imply  that 


Jl(u'l*T2(ui>)  <  JL(u1,T2(u1))  Vu^l^ 

j2(u*,T2(u*))  <  J2(Ui’Y2(ul);)  5  Vy2:U1~U2  \  (2-7) 


*  * 
u2  =  W 


V:  k  k *  *  f  * 

<=>  there  exists  a  y2  €^2  >  Y2(EU^  ’  °m~  ^  ^  =  u2>  suc^  tliat 


“2 

^  ajja  ^ 

*^l(ul»Y2^  —  '71^U1’Y2^ 


Tul6Ul 


J2*u1’y2^  -  J2^U1,Y2^  >  YY26'-t2 


(2.8) 


[Note  that  Y2(x2>  =  T2(x2)]. 


k  k  k 

Therefore,  the  conclusion  is  that,  if  (u^ ,u2=T2  (uj_) )  constitutes 


a  Stackelberg  solution  to  the  static  game  {j^,U^;k=l,2 },  with  PI  as  the 
leader,  the  strategy  pair  (a1»Y2>  *  _‘2  ,  with  y,(x2)  =  T,  (x  ) ,  VX,  6  U£  , 
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constitutes  a  Nash  equilibrium  solution  to  the  2-stage  dynamic  game 
{jk,1^;k»l,2}.  However,  the  converse  statement  is  not  true;  that  is, 
every  Nash  solution  of  {J^.^jk-l,2 }  (satisfying  inequalities  (2.8))  does 
not  correspond  to  a  Stackelberg  solution  of  the  static  game,  mainly 
because  of  informational  nonuniqueness  [3,5,6].  A  delayed  commitment 
(feedback)  Nash  equilibrium  solution  (u°,y°)€u^  *  t*2  »  f°r  th®  dynamic 
game,  on  the  other  hand,  satisfies  the  pair  of  equalities  [3] 


Z  ,  o  o,  oNN 
•J  ^  *  Y 2  (X2  )  ) 

< 

J1(U1*V2(X2)) 

9 

Vu  x  6  U  £ 

(2 

J  2  ^  ^  *  ^2  ^x2  ^  ^ 

< 

J2(U1’V2(X2)> 

9 

which  can  equivalently  be  written 

as 

_  <  O  0  f  O  v  v 

Ji(ui»v2(ui)) 

< 

J1(U1,Y2(U1)) 

9 

*»1«U1 

(2 

^2  ^U1’Y2  ^Ul^  ^ 

< 

J2(U1’Y2(U1)) 

9 

y2:  UL  - 

—  iii 

where  y^u^)  ~  Y2^Ul’°m2^  ^ 

„  ,  o  o  _  ~o 

Hence,  (u^Uj  =  y2 

(u°)) 

constitutes  a  Stackelberg  solution  to 

^Jk,Uk’ks=l,2}  wit'1  PI  as  ttie 

leader.  We  call  such 

an 

equilibrium,  when 

viewed  as  a  (feedback)  Nash  equilibrium  of  a  dynamic  game  with  hierarchical 
decision  structure,  the  "strong  equilibrium."  as  opposed  to  any  (informationally 

*  if 

nonunique  Nash)  solution  (uj_>72)  satisfying  (2.8),  which  we  call  a  "weak 
equilibrium. " 

We  now  have  the  following  result,  which  follows  from  the  Dreceding 


discussion  and  analysis. 
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Proposition.  2.1. 

If  [J1,J2;U1>U2}  is  a  static  game  admitting  a  unique  Stackelberg 
solution  with  Pk  as  the  Leader,  there  exists  a  single-act  2-stage  dynamic 
feedback  game  [j,  ,  J_  t-,U,  ,kj*k}  which  admits  a  unique  feedback  Nash 
(synonymously,  delayed  commitment  type  or  strong)  equilibrium  solution. 
Furthermore,  there  is  a  unique  correspondence  between  these  two  solutions.  □ 

Hence,  every  Stackelberg  solution  of  a  static  two-person  game 
can  be  viewed  as  a  strong  (Nash)  equilibrium  solution  of  a  particular 
dynamic  feedback  game  with  perfect  state  information.  This  correspondence 
can  in  fact  be  extended  to  the  feedback  Stackelberg  soLution  of  dynamic 
feedback  games  [2,3],  by  identifying  it  as  the  feedback  Nash  equilibrium 
solution  of  related  dynamic  feedback  games  with  twice  as  many  stages. 

Towards  this  end,  consider  the  N  stage  dynamic  feedback  game  with  state 
dynamics 


x(n+l)  =  f  [x(n),un(n),u  (n)]  ;  n=0,l,..,N-l 

n  i  t. 

m  mk 

x(n)€Xc  JR  ,  uk(n)6Uk(QC  JR  ,  k=l,2. 


and  cost  functionals 

N-l 

Jk  =  M*N^  +  2  8k,n  »ui(n)  >u2  (n)J  J  k=l,2. 


(2.11) 


(2.12) 


where  fn>  qk,  n  are  mappings  of  appropriate  specifications.  Controls 

are  allowed  to  depend  only  on  the  current  value  of  the  state,  so  that 

admissible  policies  y,  for  Pk,  at  stage  n,  are  y,  „(x(n)), 

iCy  n  tc,n  '  k. ,  n, 

v.  :X  —  U.  ,  satisfying  certain  measurability  requirements. 
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Consider  a  stagewise  asymmetric  mode  of  play  whereby  one  of  the 
players  (say,  Pi)  announces  his  strategy  and  moves  before  the  other  player 
does,  at  each  stage.  The  relevant  solution  concept  in  this  case  is  the 
so-called  feedback  Stackelberg  solution  [2,3]  which  is  defined  recursively  in 
retrograde  time  and  involves  the  solution  of  a  static  Stackelberg  game 
with  Pi  as  the  leader  at  each  stage.  By  basically  following  the  arguments 
that  led  to  Proposition  2.1,  it  is  not  difficult  to  see  that  the  feedback 
Stackelberg  solution  of  the  N-stage  game  (2 . 11)- (2 . 12)  corresponds  uniquely 
to  the  feedback  Nash  equilibrium  solution  of  a  2N-stage  dynamic  feedback 
game  (with  perfect  state  information)  defined  as  follows: 

State  equation: 


y(s+D 


Fs[y(s),u1(s),u2(s)];  y(o) 


x(o) 


“1 


(2.13) 


where 


—  f  » 

f  y(s)  +  [o  ,u.(s)]  ,  s  even 

mo  1 


Fs[y,ui,u2]  - 


(2.14) 


i  [f  s-l^m  •0«x»1)y(,)*(0»x»1*1»  )y(s>.u2(s)>°m  1 
-r —  o  o  1  oil  1 


s  odd 


uL(s)  s  u^j) 


s  even 


—  .  .  _  . s-1 

u2(s)  =  u2(— )  . 


s  odd 


(2.15a) 


11 


12 


Proposition  2.2. 

If  {j.,r.  ;k»l,2;n=0,l,..,N-l}  is  an  N-stage  dynamic  feedback 

K  kC  y  d 

game  as  defined  by  (2 . ll)-(2 . 12) ,  admitting  a  unique  feedback  Stackelberg 

•jg  «Hp 

solution  Cvi»V2^  with  P1  as  che  leader  (at  each  stage>»  there  exists  a 
2N-stage  dynamic  feedback  game  {Jk,rk  g ;k»l,2 ;s=0, I, . . ,2N-2 }  defined  by 
(2. 13)-(2.20) ,  which  admits  a  unique  feedback  Nash  (strong)  equilibrium 
solution  Furthermore,  there  is  a  unique  correspondence  between 

these  two  solutions,  given  by 


n,2n<Ix(“>  '  °m,]) 


V,  „<*<■>» 
l»n 


Y2,2n+l(lx(n)»Yl,n(x<n))1)  =  Y2,n(x(n)) 


(2.21) 


n=0, l, . . ,N-1. 


Remark  2.1. 

Since  every  feedback  Nash  (synonymously,  strong)  equilibrium 
solution  is  a  ("weak")  Nash  equilibrium  solution  [3,6],  the  feedback 
Stackelberg  solution  of  the  original  dynamic  game  is  also  a  Nash 
equilibrium  solution  of  the  2N-stage  dynamic  game  constructed  prior  to 
Proposition  2.2.  However,  we  cannot  claim  a  unique  correspondence  between 
the  two  games  in  the  framework  of  Nash  equilibria,  because  there  exists 
informationally  nonunique  weak  equilibria  in  the  latter  case.  [Note  that 
the  further  restriction  to  delayed  commitment  strategies  (feedback  Nash 
equilibria)  eliminates  this  informational  nonuniqueness,  as  discussed 
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extensively  in  [3],  and  leads  to  strong  equilibria].  The  important 
conclusion  here,  though,  is  that  the  feedback  Stackelberg  solution  in 
dynamic  feedback  games  is  Indeed  an  equilibrium  solution,  which  is 
readily  seen  by  reformulating  the  problem  in  an  appropriate  framework. 
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3.  Stochastic  Dynamic  Games  with  Structural  and 

Modal  Uncertainties  Described  by  Jump  Processes, 

3.1.  Problem  Formulation 

Having  settled  the  problem  of  identifying  the  feedback  Stackelberg 
solution  as  an  equilibrium  solution  concept,  we  now  turn  to  introducing 
and  solving  a  general  class  of  such  problems  in  which  both  the  structure 
of  the  system  dynamics  (i.e.,  transitions  from  one  state  to  another)  and 
the  mode  of  play  (i.e.,  whether  the  roles  of  the  players  are  asymmetric 
or  not,  and  in  case  of  asymmetry  which  player  becomes  the  leader)  are 
uncertain  and  are  determined  by  the  outcome  of  a  finite  state  jump  process. 

More  specifically,  consider  the  N-stage  stochastic  dynamic  game, 

with  state 

y(n)  =  [x(n)  ,r(n)  ]  5  X  x  I  (3.1) 

m 

at  stage  n  €  N  =  {0, 1, . .  ,N-1 } ,  where  XclR  °,  I=S^t-S2+/?,  the  sets  S^, 

S2  and  71  being  finite  and  disjoint.  If  r(n)$S^,  then  there  is  asymmetry 
in  the  roles  of  the  players  and  Pk  acts  as  the  leader  at  stage  n,  whereas, 
if  r(n)€^,  there  is  no  asymmetry  and  the  players  choose  their  controls 
in  accordance  with  the  Nash  equilibrium  solution  concept  at  stage  n. 

The  control  of  Pk  at  stage  n  is  denoted  u,  (n)£U,  ,  and  the  probability 

of  transition  from  state  y(n)  =  (x(n)  -  x,  r(n)  =  i)  to  state 
y (n+1)  =  (x(n+l)  €  dXrX,  h(n+l)  =>  j),  as  a  result  of  feedback  controls 
uk(n)  *  Yk>n[x(n) ,r(n)]  =  ufc;  k=l,2,  is  given  by 

Q(dX, j;x,i,u1,u2)  -  P  {x(n+l)  SdX,  r(n+l)  =  j|x(n)  =  x,  r(n)  =  i, 

^(n)  =  uL,  u2(n)  =  u2} 


(3.2) 
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where  Q  >  0  and 


Z  f  Q(dX,j;x,i,u  ,u  )  -  1,  ViS  I,  u  . 

j€I  X 


The  control  law  of  Pk  at  stage  n,  yk  n[x(n),r(n)J,  is  a  measurable 
mapping  yk  ^tXx!  ^k  a>  an<*  t^e  cost  functional  of  Pk  is  Jk  q,  where 


N-l 

Jk  o^W2;x°’±0)  *  E/  P  {qk  ,1*01)1  +  E  8k  n 

k,P  1  2  /x(p)  „  XP  k,N  n=p  k,n 

r(p)  -  iP 


(3.3) 


[xCnJ.rCnJ.Uj^Cn),^ (n)]  } 


where  E{-}  is  the  expectation  operation  with  respect  to  the  probability 
measures  that  govern  the  transition  probabilities  (3.2),  with  yk  n€<“k  r 
(k=l,2;n=0, . . ,N-1)  fixed,  and  yP  denotes  the  set  of  policies 

{  yk ,  n  >  n=lP  >  P"*"1 » •  •  >  1 }  • 

Note  that  this  is  a  stochastic  dynamic  game  with  perfect  state 
information  for  both  players,  but  not  a  standard  one  because  the  mode  of 
play  at  each  stage  is  determined  by  the  outcome  of  a  jump  process  {r(‘)}> 
which  in  turn  is  affected  by  the  past  decisions  through  (3.2).  However, 
since  the  state  of  the  game  involves  the  current  value  of  r(-)»  both 
players  know  the  mode  of  play  at  the  current  stage  (i.e.,  whether  there 
is  asymmetry  or  symmetry  in  decision  making  at  that  stage,  and  in  the 
former  case  which  player  acts  as  the  leader),  and  therefore,  the 
equilibrium  solution  is  well-defined  stagewise.  Hence,  the  game  is  played 
as  follows: 


~ir  r  I  -  — 
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Stage  o:  Both  players  observe  the  value  of  x(o)=x  and  the  outcome  of 
the  random  variable  r(o)ar°.  If  r°5S^,  first  Pk  chooses  his  control  u£ 


(and  announces  it)  and  then  Pk  reacts  to  that  by  announcing  his  control 

if  r both  players  choose  their  controls  (u°  and  u°)  simultaneously. 

I  I  I 

Then,  a  transition  to  a  new  state  y(l)=[x(l)  ,r(l)  ]  takes  place  under 

O  0  o  o 

the  stationary  transition  probability  Q(dX,j;x  ,r  and  a  cost  or 

gk  q[x°,  r^u^u^]  is  incurred  by  Pk. 

•  ••«••••• 

•  •••••••• 

Stage  n:  Both  players  observe  the  value  of  x(n)=x  and  the  outcome  of 

the  random  variable  r(n)=rn.  If  rn€S^,  first  Pk  chooses  his  control 
u£  (and  announces  it)  and  then  Pk  reacts  to  that  by  announcing  his 
control  u^,  (i.e.,  Pk  has  an  informational  advantage  over  Pk) ;  if 
rn€?2,  however,  both  players  choose  their  controls  (u^.u^)  simultaneously. 

I  t  f 

Then,  a  transition  to  a  new  state  y(n+l)  =  [x(n+l)  ,r(n+l)  ]  takes  place 
under  the  stationary  transition  probability  Q  (dX,  j  jx11,  rn,u”,  u^) ,  and  an 
additional  cost  of  gk  n[xn, rn,u^, u^]  is  incurred  by  Pk. 


N-l 

Stage  N-l:  Both  players  observe  the  value  of  x(N-l)=x  and  the  outcome 

N-l  N-l 

of  the  random  variable  r(N-l)=r  .  If  r  €Sk>  first  Pk  chooses  his 

N-l  — 

control  u^  (and  announces  it)  and  then  Pk  reacts  to  that  by  announcing 

his  control  u^-  S  if  r^  both  players  choose  their  controls  (u^  ^,u^"^) 

K  12 

simultaneously.  Then,  a  transition  to  x(N)  takes  place  under  the  stationary 
marginal  transition  probability  ZQ(dX,j;xN  ^r^  and  an 
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.  .  c.  r  ni  ,  i  N-l  N-l  N-l  N-l,  . 

additional  cost  of  N[x(N)J  +  g^  N-1lx  ,r  ,u^  .u^  ]  is  incurred 

by  Pk,  so  that  the  total  cost  adds  up  (for  each  sample  path)  to 
N-l 

qk,NCx(N)]  +  2  8k>nt^,“l,rN“1,u*’l,u®‘1]1  for  Pk, 


3.2.  The  Concept  of  Strong  Equilibrium 

Even  though  the  stochastic  game  and  the  moves  of  the  players  for 
a  particular  realization  are  delineated  in  forward  time,  the  equilibrium 
solution  is  defined  in  retrograde  time.  Towards  this  end,  we  first  consider 
a  single  stage  game  which  comprises  only  the  last  stage  of  the  stochastic 
game  of  §3.1,  with  cost  function  for  Pk.  Here,  if  the  outcome  of 

the  random  variable  r(N-l)  belongs  to  Sk,  then  the  players  choose  their 
controls  (as  functions  of  x(n-l)£X  which  is  arbitrary)  according  to  the 
Stackelberg  solution  concept,  with  Pk  acting  as  the  leader;  if,  however, 
r(N-l)  belongs  to  the  index  set  71,  the  players  determine  their  equilibrium 
controls  according  to  the  Nash  solution  concept  and  for  all  values  of 
x(N-l).  Assuming  that  these  solutions  are  unique  in  each  case  (or  that 
there  is  mutual  agreement  between  the  players  as  to  which  pair  of  controls 
to  adopt  in  case  of  nonunique  equilibria),  there  will  be  a  unique  pair  of 
expected  cost-to-go  values  transferred  to  stage  N-l  in  terms  of  the 
stationary  transition  probability  Q(-). 

Next,  we  consider  the  2-stage  dynamic  game  problem  with  cost 
functions  Jk>N_2  >  k=I»2 »  and  with  the  policies  yk  JJ_1[x(N- 1) ,  r  (N- 1)  ]  ,  k=l,  2  , 
at  stage  N-l  being  fixed  as  determined  above.  Then,  this  is  again  basically 
a  single  stage  game,  with  the  mode  of  play  depending  on  the  outcome  of 
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r(N-2),  as  at  stage  N-l;  and  stipulating  existence  of  a  unique  equilibrium 

for  each  element  of  I  and  for  all  x(N-2)SX,  we  obtain  unique  equilibrium 
* 

policies  n_2[x(N-2) ,r(N-2)] ,k=l,2,  leading  to  a  unique  pair  of  expected 

cost-to-go  values  to  be  transferred  to  stage  N-2. 

If  this  procedure  is  followed  up  inductively,  up  to  the  initial 

*  * 

stage  n=*o,  we  obtain  a  pair  of  N-tuple  policies  0; 

*  * 

y  , ...,y  }  A  {y?  .Yt  }  which  we  call  a  "strong  equilibrium"  for  the 

stochastic  dynamic  game  of  §3.1.  Note  that  this  is  indeed  an  equilibrium 
solution,  since  it  can  be  shown  by  following  the  arguments  and  the  procedure 
of  Section  2  that  it  is  related  to  the  feedback  Nash  equilibrium  solution 
of  a  dynamic  game  with  twice  as  many  stages  [see  Appendix  I].  In  fact, 
when  71=0  and  S2=0,  strong  equilibrium  is  identical  with  the  stochastic 
feedback  Stackelberg  equilibrium  with  PI  as  the  leader  [3],  and  when 
Sk=$,k=l,2 ,  it  coincides  with  the  concept  of  feedback  (delayed  commitment 
type)  Nash  equilibrium  in  stochastic  dynamic  games  (which  we  have  also 
called  "strong  equilibrium"  in  Proposition  2.2). 

3.3.  Derivation  of  Strong  Equilibria 

A  set  of  necessary  and  sufficient  conditions  for  a  strong 
equilibrium  solution  of  the  stochastic  dynamic  game  of  §3.1  can  be 
obtained  by  basically  following  the  procedure  outlined  in  53.2,  in  the 
spirit  of  dynamic  programming  since,  for  each  fixed  pair  of  controls, 
the  state  [y(n) }  is  a  Markov  process,  and  furthermore  the  controls  are 
restricted  to  be  Markov  (feedback)  controls  depending  only  on  the 
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current  value  of  the  state.  The  result  is  the  dynamic  programming  type 

equations  given  below  in  Proposition  3.1  in  terms  of  the  optimum  (strong 

equilibrium  expected  cost-to-go  functions  V.  (x, i) ,k«l,2 ;ng N  . 

K»n 

Proposition  3.1. 

Let  {u1»y1(-),u2*V2^  (v^  >  V2 )  6  ^xl^  »  denote  a  strong 

equilibrium  solution  for  the  stochastic  dynamic  game  of  §3.1,  and  J*(x°,i°) 

denote  the  corresponding  cost  to  Pk  when  the  initial  state  is  x(o)=x°, 
o 

r(o)»i  .  Then,  it  is  necessary  and  sufficient  that  the  following  relations 
are  satisfied: 


★  o  o  on 

Jk(x  ,i  )  -  Vk>o(x  ,i°> 


k*l»2. 


(3.4) 


where  V  (x,i)  is  recursively  defined  by 


k-1,2 


mLn  (  r  t  \  +l(5>j)QCdC»j;x,i,rr,  (u,  ,T_  (u,x,i))J 

■ 

+  ®k  nCx,i,TT!c(uk,T~  (u  ,x,i))JJ,  (3.5a) 
*  k,n 


16  S, 


k-1,2 


*  Z  Vk  tH.l(5»iWfd5»ji*.i.VTlr  (X,i),x,ij 

5J€I  ’  k  k,n  k,n 


V*  <x,l»  + 

’  k,n 


★ 

y_  (x,i))]  ,  i$S_ 

k,n  k 


(3.5b) 


(x,i) 


k-1,2 


*  min  {  f  E  V.  n+^(§,  j)Q[dS>  j  jx, i.rr^Cu^,^  (x,i))] 
u€U,  X  j  €  I  *  k>n 

K  y  tl 


(3.5c) 


+  gk  nfx»i>TTk(uk*>'-  (x,i))],  ie^, 

’  k,n 


with  the  boundary  condition 


Vk,N(X,i)  *  qk,N[x(N)1  ’  Yi€I  ’  k=1’2 


(3.6) 


Here,  Tk  n(u£,x,i)  is  defined  by 


T,  n(UW*X,i)  *  arg  min  £  ^  S  Vk  n+l ^5. j)Q[d5» J ;  X,i,u  ,U,] 

k,n  k  Vk.a  XJ«I  k'"+1  1  - 


+  ®k,n^X,1,Ul,U2^  ■"  ’l6Sk’ 


(3.7) 


ft 


ul’T2,n 


T,  „>U? 

i.  y  n  fa 


if  k=l 


if  k=2 


(3.3) 


f 

argument  of  (3.5a)  for  all  x£X,  if  ig 

(x,i)  =  T  Iy2  (x,i),x,i]  ,  if  i€  S- 

1  K»n  k,n 

k  argument  of  (3.5c)  for  all  x€  X  ,  if  i€?2. 


(3.9) 


The  strong  equilibrium  solution  is  unique  whenever  (3.7)  and  (3.9)  are 
uniquely  defined.  : 
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Remark  3.1. 

One  special  class  of  problems  whose  strong  equilibrium  solution 
can  be  determined  explicitly  is  the  class  characterized  by  a  linear  state 
equation,  quadratic  cost  functionals  and  with  the  transition  probabilities 


(3.2)  independent  of  the  state  and  controls,  i.e.,  Q(dx, j ;dx, i.u^.u^ , 
where  ^ ' s  are  constants.  In  this  case  the  equilibrium  strategies  will  be 
linear  functions  of  the  current  value  of  the  state,  with  the  multiplying 
gain  matrices  depending  on  the  outcome  of  the  Markov  jump  process  (r(-)}; 
exact  expressions  can  readily  be  determined  from  (3.5)  by  recursive 


evaluation. 
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4.  Feedback  Equilibria  for  Differential 
Games  with  Jump  Disturbances^ 


4.1.  The  Controlled  Stochastic  System 

Consider  a  stochastic  system  of  the  form 

x  ■  fr(c)  (tjXjU^,^) ,  x£X,  u^SU^,  k=l,2 


(4.1) 


with  an  initial  condition 


x(o)  =  x 


r(o)  =  r 


(4.2) 


In  (4.1),  r(t)  is  a  finite-state  stochastic  jump  process  and  the 
RHS  changes  from  f1(t,x,u1>u2)  to  f J  (t,x,u.,u2)  as  r(t)  jumps  from  i  to  j . 
In  (4.2),  r°  is  a  random  variable  determining  the  initial  state  of  the 
process  r(t). 


m 


A 


The  state  x  belongs  to  XcIR  ,  and  the  control  takes  values  in 


H  ,  (k=l,2).  At  a  fixed  terminal  time  T,  there  are  bounded  functions 


qk(x) 


VX  -  ft 


k-1,2,  igl 


(4.3) 


which  are  continuously  differentiable,  with  bounded  derivatives  in  x, 
and  they  determine  the  respective  terminal  costs  incurred  by  the  two 
players  k-1,2,  if  r(T)=i  and  x(T)=x.  Let  I  denote  the  state  set  of  r(t). 
For  each  i  in  I, let  fL(t,x,u^,u2) , 

i  m 

f  :  [0  ,  1]  x  X  x  U1  x  U2  -  1R  0 
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be  a  continuous  bounded  function,  continuously  differentiable  with  bounded 
partial  derivatives  in  x,  u^  and  .  Let  *L^,k*l,2  be  two  classes  of 
admissible  control  laws  u^(t,x)  with  values  in  defined  on  I  x  [0,T]  x  X 

such  that  u£(t,x)  is  piecewise  continuous  in  t,  continuously  differentiable 
with  bounded  derivatives  in  x. 

In  order  to  introduce  the  controlled  stochastic  process,  we 
suppose  that  a  measurable  space  (Q,  3)  is  given,  called  the  sample  space. 

We  consider  a  function  y(t,w), 

y  :  [0,T]  xfJ-XxI 

y(t,w)  =  (x(t,u>)',r(t,  j)'  )  % 

* 

which  is  measurable  w.r.t.  x  3. 

Let  3t  =  o^yCs.Ols  <  t]  be  the  a-fields  generated  by  past  observations 
of  y  up  to  time  t. 

We  now  assume 

A  1.  The  behavior  of  the  system  under  any  admissible  control  law 
u  €  x  ^  is  completely  described  by  a  probability  measure  6>u  on 

(Cl,  )  • 


Assuming  a  measurable  state  space  (X  x  I,  3L)  and  denoting  by 
Sj0  Tj  the  Borel  a-field  on  [0,Tj.  1 
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Thus,  the  process 

yu  ■  (y(t,-),  3t,  <?u),  t  5  [0,T] 


is  a  well-defined  stochastic  process. 

A  2.  For  any  control  law  u  5  ^  x  ^  and  almost  any  u La  n, there  exists  a 

piecewise  constant  function  ^  (t,x) 

u 

u.  :  [0,T]  x  X  —  I 

U) 

such  that  y(t ,^)  =  [x' (t,w),r* (t,w) J '  satisfies  the  following  equations 


fcajt.x)  ’^(t.x)  u  (t,x) 

x  =»  f  (t.X.u^  (t,x),u2  "  (t.x)) 


(4.4) 


r(c^)  -  ^(t,x)  (4.5) 

where  r(t,aj)  is  a  step  function  with  a  finite  number  of  jumps  on  [0,T] . 

A  3.  For  each  (t,x)  in  [0,T]  x  X,  there  exists  a  matrixwith  elements  (.\_(t,x) 
which  are  real-valued,  continuous,  bounded  functions,  such  that 


XijCt.x)  >  0  ,  yte  [0,TJ  ,  X6X  ,  i#j 

(4.6) 

£  X.  .(t,x)  =0,  Vt^  [0,T]  ,  x<=X 

J€1  J 


and  such  that  for  any  admissible  control  law  u  6  ^  x 


P  (y,(t+h,x(t+h)  )=j  |u(t,x(t)  )=i,x(t)=x] 
■  (t,x)h  +  o(h;u,x) 


(4.7) 


i 
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Pu  [p,(t+h,x(t4h))=i  jMl(t,x(C))=i,x(t)=x]  = 


(4.8) 


1  +  Xii(tsx)h  +  o  (h;u,x) 


where  o(h;u,x)  is  a  quantity  such  that 


lim  °(hSuix)  =  o 
hi  0  -h 


(4.9) 


uniformly  for  all  x  in  X  and  u  in  x 


The  assumption  A1  allows  the  modeling  of  the  system  as  a  controlled 
probability  space.  Assumption  A2  describes  the  assumed  relationship 
between  x(t)  and  r(t)  through  the  differential  system  (4.1).  Assumption 
A3  is  a  conditional  Markov  assumption  on  the  jump  process. 

For  an  admissible  control  u  x  ty,,  u  &  (u*(t,x) ,u* (t,x) ) 

let  V^(t,x)  denote  the  corresponding  values  of  the  conditional  expectation: 


V^(t,x)  =  E[qf (T)(x(T)) |x(t)=x, r(t)=i] 


k=l,2 


which  will  henceforth  be  referred  to  as  the  cost-to-go  function  for  player 
k-  As  regards  this  function,  we  now  state  two  lemmas  which  will  be 

used  later  in  our  analysis. 

Lemma  4.1:  The  cost-to-go  functions  v£(t,x),  k=l,2  associated  with  an 
admissible  control  law  u  (  ^  x  ^  satisfy  the  system  of  partial  differential 
equations 


Note  that  we  have  only  terminal  cost  function  for  both  players, 
which  simplifies  the  mathematical  derivation  to  be  given  in  the  sequel  ,  without 
bringing  in  any  real  loss  of  generality. 


1 


4.2.  G(6 )-  Differential  Games 

Consider  a  fixed  partition  of  the  set  I  of  possible  values  of 
r(t)  into  three  subsets 


1  “  S1  +  S2  +  71 


(4.11) 


Consider  also  a  partition  of  [0,T]  into  N  subintervals  of  length  6  *  — 

N 


[0,T)  »  [0,t1)  u  [tltt2)  U...U  [tNl,T] 


6,2 


and  on  each  subinterval  [t^, t^+^  )  introduce  the  classes  of  controls 


=  ‘Wi>  -°k  •  k-L-2- 


A  6-strategy  for  player  k  is  a  vector 


5  a  ✓  6  >  JL\ 

Yk  '  X“0*  1»  •  •  •  »N-1 


5  Z 

where  each  component  y  ’  is  a  function 


Xxlxul’1- 


(4.12) 


which  associates  a  unique  control  u^’^  in  U^’^, 

uk”i(’)  =  Yk’4  (x(t^),r(tz),u|’'e(.)) 

with  each  state  (x1 (t  ) , r ' (t  ) ) 1  observed  at  sampled  time  t  and  each  control 

*  *  1 

uk* ^(’)  in  ^k* ^  (as  ^3,  k  stands  for  2  if  k=l,  and  1  if  k=2),  such  that 


the  following  condition  holds 
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r(t£)  =  i en  usk  =  Y^'e(x(ti),r(tjl),u|>4(-)  =  ^^(xCt^.rCt^)) 


(4.13) 


where  y®"4  :  X  x  I  -  U^’1 


This  last  condition  ensures  that,  if  r(t^)  is  7?  or  S^,  Pk 
6  Z 

cannot  adjust  the  control  u°**(-)  he  will  use  on  the  time  interval 
[t^,t^+^>  to  the  control  u^1 ^(*)  chosen  by  his  opponent. 

Given  a  6-strategy  pair  y^  =  (y^jY^)’  c^e  Sam®  is  played  as 
follows  (analogously  to  the  discrete- time  game  discussed  in  Section  3). 

1.  The  two  players  observe  x(o)=x°  and  the  initial  value  r(o)=i°. 

o  o 

If  i  player  1  is  the  leader  and  he  has  to  move  first;  if  i 

player  2  is  the  leader  and  moves  first;  if  i°€7 !  the  two  players  move 

simultaneously . 

2.  At  any  sampled  time  t.,  if  player  k  is  the  leader  (i.e.,  r(t  )6S,  )  , 

*  l  k 

or  if  there  is  no  leader  (i.e.,  if  r(t^)€7()>  then  this  player  moves 

5  l 

first  by  chasing  his  control  u^’  (•)  according  to  the  mapping 
y^’^(x(t^),r(t^)) .  If  r (t^)€  S^,  then  the  other  player, Pk,  is  the 
follower  and  he  chooses  his  control  u^’  (•)  according  to  the  mapping 

y|*£(x(t^),r(tjJ),u^i(.)). 

6  1 

3.  Once  the  controls  (u^J  2  ^ave  been  determined,  the  system  evolves 

from  (t£,x(Z£)  ,  r(t^) )  to  (t^+^>x(t^+k> ,  r(t^+^) )  •  Again  the  leadership  is 

£,+1 

determined  by  the  value  of  i  and  the  game  is  played  as  described  above. 

5  5  2 

Associated  with  a  o-strategy  pair  y  =*  (y  ,Y  )  are  thus  defined 


two  cost  functions 

6  6 


Jk(tr5,i;yk,Y2>  =  Etq^(T)  (x(T))|x(t  )  =  §,r(t.)  =  i] 
k-1,2 


(4. 14) 


defined  for  each  sampled  time  t^  and  each  possible  state  (x(t  )=^,r(t  )=i) 

z  **  a 
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For  each  triple  (t.,§,i)  the  costs  (4.14)  define  a  game  in  normal 

z 

form.  This  class  of  games  on  [0,T]  x  X  x  I  will  be  called  the  G(6)-game 
associated  with  the  dynamical  system  (4.1)-(4.7).  A  G(6)“game  has  exactly 
the  same  structure  as  the  multistage  game  introduced  in  Section  3.  Such 
a  game  is  thus  defined  for  each  664  where  6  4  {T»  A  strategy 

for  player  k  is  defined  as  a  sequence 


Vk  -  tv*J 


6-0 
6  6  A 


(4.15) 


Furthermore,  a  strategy  pair  y  =  (y^>Y2^  Playable  on  [0,T]  x  X  x  I 

if  the  limit  (4.16)  exists  for  each  k=l,2,t  6  [0,T],  §  €  X,  i€  I 

z 


6—  0 
6€  A 

vc 


(4.16) 


Definition  4.1:  The  Differential  Game  associated  with  the  dynamical 
system  (4.1)-(4.7)  is  the  family  G  =  [G( 5)}  of  all  G(6)-games  having 
cost  functions  (4.14),  with  6  6  A.  c 

We  now  introduce  the  concept  of  a  "pure-f eedback  strategy"  for 
such  a  game.  Towards  this  end,  let  there  be  given  two  functions 


Yk(-)  :  [0.T]  x  X  x  I  x  -  Ufe 


k-1,2 


such  that 


i  €  71  u  Sk  =■  Yk  (t,x,i,u-)  =  y(t,x,i), 


(4.17) 
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and  which  are  continuous  in  t,  bounded,  continuously  differentiable  in  x 
and  u£,  with  bounded  derivatives.  The  relation  (4.17)  must  be  linked 
to  (4.13)  as  it  expresses  a  similar  rule:  when  i  is  such  that  Pk  is  not 
the  follower,  the  function  Yj^*)  does  not  depend  on  u^. 

It  is  possible  to  associate  a  control  law  u  ?  ^  x  with  the 
pair  (y1('),v2(‘))  b?  defining 


u£(t,x)  A  if  i  6  ^  u  Sk 

> 

uk(t,x)  4  if  i  €  sk 

k 


(4.18) 


We  can  also  associate  a  whole  family  of  6-strategies  with  the 
pair  (Y]_('),Y|*))  by  proceeding  as  follows: 


T  5  6  i.  6  =  H  (  {0,1, .. .  ,N-1  } 

N 

vk^(x(V,r(V,Uk,i('))=  uk’i(-) 

with 

Uk"4(t)  *  Yk(t’x<V’r(V’Uk’'*(t:)) 

k=l,2  ,  k-1,2  ,  k#k. 


}  (4.19) 


c  6  [VVl> 


J 


The  controls  in  (4.19)  are  well  defined  since  the  maps  y^(-)  and  Y?(')  satisfy 
(4.17). 

Therefore,  a  strategy  pair  Y  -  (yl»Y2>  for  the  differential  game 
G  can  be  associated  with  (yl(- ) ,Y2 (• ) ) •  Thus,  we  pose 
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Definition  4.2:  A  strategy  pair  y  »  (Yi»Y2)  ts  called  a  Pure- feedback 
strategy  for  G  if  there  exist  two  functions  y^(-)  and  y2(')  satisfying 
(4.17)  and  defining  through  (4.18)  an  admissible  control  law,  such  that 
y  -  (y1»Y2)  can  be  associated  with  (y^ ‘ ) » Y2 (* >) •  D 

We  are  now  in  a  position  to  introduce  the  concepts  of  weak  and 
strong  equilibria  (as  counterparts  of  those  Introduced  in  Sections  2  and  3) 
for  the  differential  game  formulated  above.  We  first  have 


"fc  ic  Vf 

Definition  4.3:  A  pure-feedback  strategy  pair  y  *  (y^,y2)  constitutes  a 
weak  equilibrium  over  [0,T]  x  X  x  I,  if  it  is  playable  on  [0,T]  x  X  x  I 

ic 

and  if,  for  every  y^  such  that  ^(y^Y^)  is  pure-feedback  and  playable  on 
[0,T]  x  X  x  I,  and  for  every  r  €  [0,T),  5  5  X,  i  6  I,  the  following  holds: 


J^T.g.ijy^Yj)  <  Jk(T>§>i»TTj£(Yjc>Yk))  >  k«l,2, 


(4.20) 


As  noted  in  Section  3,  the  class  of  weak  equilibria  is  very  rich 
indeed,  since  it  involves  "informational  nonuniqueness . "  A  stronger 
concept  that  is  free  of  informational  nonuniqueness  is  that  of  a  strong 
equilibrium  which  was  introduced  in  Section  3  for  discrete-time  (multistage) 
dynamic  games.  In  order  to  introduce  a  similar  concept  in  a  differential 
game  we  have  to  use  a  limiting  process  as  follows. 


Assume  that,  at  time  t  6  [0,T] ,  the  value  of  r(t)  is  observed 
to  be  in  Sk>  implying  that  Pk  is  the  leader.  Given  the  state  (x(t),r(t)) 

•k  ic  ic 

and  a  weak  equilibrium  pair  y  *  (y^,y2)»  we  define  an  e-deviation  with 

value  u,  ,  for  the  leader,  as  a  pure-feedback  strategy  vP  — 
k  k;e,u 

D  *  * 

tt.  (v.  .  —  ,y — )  is  playable  on  [0,T]  x  X  x  I  and  satisfies 

K  K.  ,  e  ,u^  K 


such  chat 
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Remark  4.1:  The  notions  of  s-deviation  and  e-reaction  are  defined  locally. 
They  correspond  to  a  temporary  perturbation  of  the  equilibrium  strategy 

“fC  'fc  “ff 

pair  y  *  For  a  l-en8th  of  time  less  than  e  the  leader  deviates 

from  his  equilibrium  strategy,  by  playing  u^.  On  the  same  time  interval 
the  follower  responds  by  playing  u— .  After  time  t+s,  or  earlier  if  there  is  a 
jump  in  r(.)>  the  original  equilibrium  strategy  resumes.  c 
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We  can  now  introduce 

*  *  *  . 

Definition  4.4:  A  pure-feedback  strategy  pair  y  *  *9  a  3tronK 

ecuili^ rlum  for  the  differential  game  G  if 

(i)  it  is  a  weak  equilibrium, 

(ii)  for  k-1,2,  for  any  (t,g,i)  €  [0,T]  x  X  x  Sk>  ^  6  Uk  and  ^  €  Up 
there  exists  g'  >  0  such  that  for  all  g,  0  <  g  <  g',  the  g-deviation 
with  value  Uj^  and  the  g-reaction  with  value  ii£  are  well-defined 
pure-feedback  strategies,  and  the  following  holds: 


D  _ 

J£(t,§,i;iTk(Yk.e  "  *Yk))  — 


T  ( f*  ff  i  •  /  D  _  R  ^ 

k*  ’^k'^kjgjV^*  ^kjg.vp))  +  O(g) 


(4.25) 


with 


lim 
6—  0 


JSLsX 


* 

Remark  4.2 :  In  a  strong  equilibrium,  the  strategy  y£  is  the  best  response, 
locally,  by  Pk,  when  he  is  the  follower,  to  a  temporary  deviation  from  the 
equilibrium  by  the  leader  (Pk) .  □ 

Remark  4.3:  It  should  be  noted  that  the  concept  of  "strong  equilibrium" 

(as  well  as  that  of  weak  equilibrium)  is  a  limiting  property  of  a  G  game, 

*  *  * 

since  a  strong  equilibrium  strategy  pair  y  *  (y^.yj)  does  not  necessarily 
provide  a  strong  equilibrium  to  every  G(6)  game,  for  5  >  0,  in  the  sense 
of  §3.2.  But  it  turns  out  that  this  is  a  more  convenient  and  versatile 
definition  for  verifvingthe  validity  of  the  results  to  be  given  in  Thms .  <1.1 
4.2  in  the  sequel.  ; 


and 
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4.3  Hamilton- Jacobi  Equations 

We  are  now  la  a  position  to  obtain  the  Hamilton  Jacobi  equations 
associated  with  the  weak  and  strong  equilibrium  solutions  of  the  differential 
game  G.  We  first  make  two  additional  assumptions  concerning  the  existence 
and  admissibility  of  solutions. 


A  4.  The  differencial  game  G  admits  at  least  one  strong  equilibrium  in  the 
class  of  pure- feedback  strategies.  □ 

A_2-  For  k-1,2,  given  any  (t,§,i)  g  [0,T]  x  X  x  1,  and  any  ufc  £  U^, 

there  exists  h  >0,  e  >0,  and  a  playable  pure- feedback  strategy 
* 

pair  ttjc(Yic»Y£)  which  gives  rise  to  an  admissible  control  law 
u  €  Hi  x  3uch  that 

u£(s,x)  *  uk(s)  for  s  g  [ t, t+h]  ,  k-1,2, 

for  all  x  in  an  e -neighborhood  of  5.  C 


Theorem  4.1:  Assume  that  A1-A5  hold  true.  A  necessary  and  sufficient 
condition  for  a  pure-feedback  strategy  pair  (y^ (• ) » V2 ( ' ) )  to  provide  a 
weak  equilibrium  for  the  differential  game  G  is  that  for  each  i  6  I, 
k  €  £l,2},  the  cost-to-go  functions  Vk(t,x)  satisfy  the  following  partial 
differential  equations 


£  ^Vk(t’x>  +  ^  Vk(t»x)fl(c»x»^(uk»>I(t,x,i,uv) 


Uk€Uk  * 


kv“k’^ 


+  Z  \  (t,x)Vj*(t,x)} 

j€I  J 
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^  V*(tsx)  +  ^  V^(t,x)f1(t!x,u^*(t>x),U2*(t,x>) 


+  £  X  (t,x)vj(t,x)  =*  0, 
j€I  J 


(4.26) 


where  k=l,2,  k=l,2,  k#k,  and  u  g  ^  x  is  the  feedback  control  law 
generated  by  (yi  (*  ) » Y2  (*  ^  •  T^le  boun<*ary  conditions  are 


v£(T,x)  *  q^(x)  ,  i€  1 


xgX 


k-1,2. 


(4.27) 


Proof  of  Necessity:  Let  u,  be  a  point  in  U,  and  u  be  the  control  law 

■  *  tC  K 

defined  by 


(  i,  ix 

1  uk(s,x')  *  uk 


I  u~(s,x')  =  Y^(s,x’  ,i,uk) 


i  i* 

u^(s,x  )  *  u ^  (s,x')  ,  1=1,2,  otherwise. 

The  control  u  is  admissible  according  to  A5 .  Furthermore,  it  is  the  control 

* 

law  associated  with  a  playable  strategy  pair  it  (y^y^)  which  is  also  defined 
by  two  mappings 


if  t<s<t+h  and  x'  is  in  an 
e-neighborhood  of  x 


Yk 


(5.x', i,U£)  H  Uk 


Yj^(v>  a  » j  j  ujc)  *  Y^(v>?»j.uk) 


if  t  <  s  <  t  +  h,  x'  is  in  an 
e -neighborhood  of  x 


Yk(v,§,j,u-)  *  Yk(v»  §>.!>“£)  elsewhere 


everywhere. 


1 
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Let  V^(t,x)  denote  the  expected  cost-to-go  corresponding  to  u.  Because  of 

★ 

the  equilibrium  property,  tt^Y^ V^)  is  such  chat 


Jk(rrk(Yk’vk))  -  Jk(vl’V2) 


i 


e . 


By  an  argument  similar  to  the  one  used  by  Rishel  in  the  proof  of  Theorem  4 
of  [7],  we  obtain,  when  h  —  0> 


0  <  £  vj(c,*)  +  £  Vlk 


+  z 

Combining  (4.28)  with 


i 

(t,x)f  ( t , x, tTj^ (u^ , ( t , x, i , u^) ) 

\ijCt,x)V^(t,x) 

(4.10)  of  Lemma  4.1  gives  (4.25). 


(4.28) 


Proof  of  Sufficiency:  Let  u£(t,x)  be  a  control  law  for  player  k  and 
define  u£(t,x)  =  ^(t,x, i,u^(t, x) ) .  If  the  pair  rr^uj^t.x)  ,U£(t,x)) 
constitutes  an  admissible  control  law  u  (  ^  x  then  it  defines  an 
expected  cost-to-go  v£(t,x)  which  satisfies  the  inequality 

v£(t,x)  >  v£(t,x). 


The  proof  of  this  inequality  is  a  direct  consequence  of  Lemma  4.2,  as  in 
Rishel's  Theorem  4  [7].  3 
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Theorem  4.2:  Assume  that  A1-A5  hold  true.  A  necessary  and  sufficient 
condition  for  a  pure-feedback  strategy  pair  (v^(- ) ,v2 )  to  pr°vide  a 
strong  equilibrium  is  that  for  each  16  I  ,  kg  {1,2},  the  cost-to-go 
functions  V^(t,x)  satisfy  the  set  of  partial  differential  equations  (4.26), 
together  with  the  boundary  condition  (4.2  7),  and  that  the  following 
holds: 

Y(t,x)  6  [0,T]  x  X,  Y  i  €  Sk,  Y  uk  g  Uk,  ^  6  U£ 

Vi(t,x)f1(t,x,  -k(uk,Y“(t,x,i,uk)) 

d"  '  (4.29) 

-  ax 

k-1,2  ,  k-1,2  ,  k/k 


Proof  of  Necessity:  Consider,  at  a  point  (t,x)  6  [ 0,T]  x  X.and  for 

i  £  sk,  an  e-deviation  with  value  uk  for  player  k  and  an  e-reaction 

with  value  u^  for  player  k. 

~D  **DR 

Let  x  (t  +  s;  x,  i)  and  x  (t  +  s ;  x,  i ) ,  s  >  0,  denote  the 


.trajectories  emanating  from  (t,x),  with  r(t)si,  and  generated,  respectively, 

.  ,  D  *  .  ,  D  R 

by  TTvWt-c  “  » YtT)  and  tt.(v.  ,  7  »Yv.e  TT  )■ 

K.  IS.  ,  t  ,  K.  K.  IS. ,  £  ,  KjE) 


According  to  Definition  4.4,  we  must  have,  for  e  sufficiently  small. 
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i 

I 

j.  Therefore,  if  (4.29)  holds  true,  condition  (4.24)  is  also  satisfied. 

□ 

# 

4 

i 

I 


i 
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where 


^  v£)  4  arg  min  [  ^  V^( t,x)  .  f 1  (t,x,rr(uIc>uf)) 


V  gx  k 


“k 


+  g£(t,x,n(uk,u£))] 


k  k 


(5.6) 


r  arg  of  (5.2)  , 

J  arg  of  (5.3)  , 

1  * 

^i...  i  d  x 
T,  (t.x.ur-  ,  ~  V,  ) 
kv  *  ’  k  ’  3x  k' 


i  <*  71 

1  6Sk 
i  6  S£ 


(5.7) 


Now,  in  the  two  subsections  to  follow,  we  consider  two  special  cases, 
viz.  the  case  when  the  mode  of  play  is  fixed  (there  is  no  chance  variable) 
and  the  case  of  linear-quadratic  differential  games  with  the  mode  of  play 
determined  by  a  Markov  jump  process. 


5.1.  Deterministic  Differential  Games  with  a  Fixed  Mode  of  Play 

Assuming  that  there  are  no  chance  moves,  we  now  differentiate 

between  two  prototypes,  viz.  the  case  when  one  of  the  players,  say  Pi, 

is  always  the  leader  (i.e.,  he  has  informational  advantage,  which  is 

though  only  incremental),  and  the  case  of  symmetric  mode  of  play,  which 

corresponds  to  the  Nash  equilibrium  solution.  For  the  latter,  S ^ ~s2  , 

7?*{l,2},  j  =0  Tij ,  and  the  dynamic  programming  equations  associated  with 

*  *  ★ 

a  strong  equilibrium  solution  u  »  (u^ (t,x) ,u  (t,x) )  are  easily  obtainable 
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3t 


Vk(tfx) 


'  0111  1  ^  Vk(t»x>-f(t»x»rTk(uk»uJ(t>x>)) 

Uk 

+  ^(t.X.TT^llj^U^t.X)))]  ,  k-1,2 


(5.8) 


Vk(T,x)  *  qk(x)  (5.9) 

* 

where  uk(t,x)  =  arg  RHS  of  (5.8). 

These  relations  characterize  the  so-called  "feedback  Nash  equilibrium 
solution"  [3,9],  and  in  this  case  the  concepts  of  weak  and  strong 
equilibria  coincide. 

In  the  former  case,  however,  71=0,  S^=(l},  S2=0,  \^=0  Vij , 
and  the  dynamic  programming  equations  associated  with  a  strong  equilibrium 
solution  u  =  (u^t.x),  u2(t.x))  read  [from  (5. 3) - (5.  7)] : 

^  VL(t,x)  =  -  mn  [  ^  V1(t,x).f(t,x,u1,T2(t,x;u1,~^-  )) 

3V 

^2  ) )  1  (5 .  LO) 

V2(t,x)  »  -  ^  V2(t,x).f(t,x,u*(t,x),u*(t,x)) 

-  g2(t,x,u*(t,x),u*(t,x))  (5.11) 


Vk(T,X/  =  qk(x) 


(5.12) 


where 


av  av2 

^  =  erg  min  [  f (t,x,u^,U2)  +  82  (5.13) 

U2 

u*  -  arg  RHS  of  (5.10)  (5.14a) 

u*  -  T2(t,x;u*(t,x),^  V^t.x))  .  (5.14b) 

ic  ★ 

The  solution  (u^>u2)  satisfying  the  above  relations  may  be  called 
the  continuous -time  feedback  Stackelberg  solution,  because  it  is  the  natural 
counterpart  of  the  discrete-time  feedback  Stackelberg  solution  (well- 
established  in  the  literature  for  discrete-time  dynamic  games  [2,3])  in  the 
continuous -time  domain,  that  is  for  differential  games.  Here,  the  leader 
has  only  an  incremental  informational  advantage  over  the  follower,  at 
each  instant  of  time,  and  he  cannot  announce  his  strategy  ahead  of  time 
as  in  the  case  of  the  standard  Stackelberg  problem.  Furthermore,  it  should 
be  noted  that  weak  and  strong  equilibria  do  not  necessarily  coincide 
here,  and  there  exist  in  general  infinitely  many  weak  equilibria  because 
of  "informationally  nonuniqueness". 

Since  the  asymmetry  in  the  roles  of  the  players  in  a  continuous¬ 
time  feedback  Stackelberg  solution  is  only  incremental,  one  may  be  led  to 
the  conclusion  that  the  feedback  Stackelberg  solution  should  coincide 
with(or  be  very  close  to)  the  feedback  Nash  solution.  Such  an  implication 
is  also  evident  in  the  analysis  of  Friedman  in  Chapter  8  of  his  book 
[10,  p.  290],  where  he  defines  the  "stable  equilibrium  value"  of  an  N-person 
deterministic  differential  game  as  (using  our  terminology,  naturally 
extended  to  N-person  games)  the  limit  (as  5  >  0)  of  the  N-tuple  cost  values 
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associated  with  strong  equilibria  of  G(5)  differential  games,  independent 
of  the  nature  of  the  asymmetry  in  the  roles  of  the  players  (i.e.,  independent 
of  the  order  in  which  the  players  announce  their  decisions  stagewise). 
Friedman  also  shows  that  for  linear-quadratic  differential  games  such  a 
value  exists  whenever  T  is  sufficiently  small,  which  is  the  "Nash"  value 
associated  with  the  feedback  Nash  equilibrium  solution  [10,  Thm.8.7.1]. 
However,  such  a  result  does  not  hold  for  more  general  classes  of  games, 
as  can  be  observed  by  comparing  the  conditions  (5.8)  and  (5. 10)- (5. 13) ; 
in  particular,  if  T 2  defined  by  (5.13)  is  functionally  dependent  on 
u^  [which  is  not  the  case  in  the  strictly  linear-quadratic  problem 
considered  by  Friedman],  the  solutions  of  (5. 8)- (5. 9)  and  (5. 10)- (5. 12) 
will  in  general  be  different^",  which  implies  that  the  strong  equilibrium 
solution  under  an  asymmetric  mode  of  play  (i.e.,  the  feedback  Stackelberg 
solution)  is  in  general  different  from  the  strong  equilibrium  solution 
under  a  symmetric  mode  of  play  (i.e.,  the  feedback  Nash  solution).  To 
illustrate  this  point,  let  us  consider  again  the  linear-quadratic  structure, 
but  somewhat  more  general  than  that  of  [10,  Thm.8.7.1]: 

f  =  Ax  -  -  B2u2  (5.15a) 

gk  *  \  (x'Qkx  +  u'uk  -  2u'Rk^-)  ,  k-1,2  (5.15b) 

qkaIx’CkX  ;  Qk  Z  °’  Ck  ^  0  (5.15c) 

^Assuming,  of  course,  that  neither  =  -J2>  nor  s  J2  >  i-e-» 

Che  underlying  problem  is  not  a  team  or  a  zero-sum  game. 
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where  capital  letters  denote  matrices  of  appropriate  dimensions. 
Then, 


45 


3V 


T2  -  arg  min  {  -g  [A*  -  -  B^J  +  j  *’Q2*  +  j 


2U2 


-  *2*21*0 


dV, 


*2iui  +  <irV’ 


(5.16) 


Substituting  this  into  the  RHS  of  (5.10)  and  evaluating  the  minimizing 
u^  we  obtain 


U1  “  (I  ’  *12*21 


-i  av3  av 

*21*12>  [R12B2(  *T  >'  +  (Bi  +  4lB2>  <  1?  >'] 


av, 


■  -  V 


dx 


(5.17) 


)’ 


assuming  that  the  required  inverse  exists.  The  corresponding  is  then 
(from  (5.16)  and  (5.17)] 


5V1  SV 

Rnh  <ir>'  +  «2  •  r2iV  <!#)' 


av,  av, 

4-4  <ir>’  ‘4  <  *b' 


(5.18) 


Resubstituting  (5.17)  and  (5.18)  into  (5.10)  and  (5.11),  we  finally  obtain 
the  coupled  set  of  partial  differential  equations: 
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at 


av 


_ 2 

at 


av 

SV1  , 

SV2  „ 

1 

3  • 

i. 

ax 

[Ax  +  Bl( 

“5T  y 

+  b2(  ■ 

— -  )  ]  - 

ax  ’  1 

j  x  Q 

lx 

- 

avi 

i  (1T> 

’  +K2 

av 

<“aT> 

ovi 

tKilT  +  K2 

ax 1 

aVi  , 

SV2  „ 

'  R12  tLl( 

av 

+ 

tRi 

(  — -  ) 

v  ax 

+  k2  ( 

—  )'] 
ax 

—  > 

'  +  L2 

av. 

avi 

aV2  „ 

1 

3  _ 

ax 

[Ax  +  B^ 

<“aT> 

*  +B2 

"  2  * 

'Qf 

- 

7<L 

avi 

■1  <TT  > 

'  +L2 

av, 

*3 ’ [ll  ( • 

avi 
— k  y 

ax 

+  L2 

avL 

av2 

avi 

+ 

tLi 

(  — -  )' 
ax  ‘ 

+  L2  < 

ir>'i 

R21  tKl 

)'  +  K. 

:,x) 

i 

“  2 

x'C,  x 
k 

^Bi 

K1  + 

'  B2  L1 

*  B2 

4BL  k2 

+  b2  l2 

. 

av2 

ax 


(5.19) 


av2 

"a** 


(5.20) 


For  the  feedback  Nash  equilibrium  solution,  however,  the  relevant  set  of 
PDE's  can  be  derived  using  (5.8),  and  found  to  be  in  the  same  form  as 
(5.19)  but  with  K^,  ,  L^,  ,  B^  and  B2 ,  respectively  replaced  by  the 

"hat' ted"  quantities 


K, 


“  (I  ”  R^2R21  ^  R1  ’  ^2  _  ^  "  R12^21^  R12B2 


L1  = 


(!  -  R21R12^  R21B1  ’  L2  *  '  (I  "  R21R12)  ^ 


B1  K1  +  B2  H 


B2  =  Bl  K2  +  B2  L2 


Note  that  if  the  cross  terms  in  the  cost  functions  are  absent  (i.e.. 


r12=*°,  R21  =0),  we  have  the  simple  relations 


hi 


K1  "  Kl*  “  B1  ’  K2  “  K2  =  ° 


which  imply  that  the  two  sets  o£  FDE's  become  identical,  thus  admitting 
the  same  set  of  solutions.  This  then  corroborates  Friedman's  result 
mentioned  earlier.  If  the  cross  terms  are  not  absent,  however,  the  two 
sets  of  PDE's  are  intrinsically  different,  and  admit  different  sets  of 
solutions.  Hence,  even  in  linear  quadratic  games  with  generalized 
quadratic  cost  functionals,  the  feedback  Stackelberg  and  Nash  solutions 
may  be  different  (or,  in  Friedman's  terminology,  a  stable  equilibrium 
value  may  not  exist,  no  matter  how  small  T  is). 

We  now  conclude  this  subsection  by  reporting  a  result  on  the 
existence  and  structure  of  the  feedback  Stackelberg  solution  of  the  linear- 
quadratic  differencial  game  described  by  (5.15). 

Proposition  5.1.  If  T  is  sufficiently  small  and  the  matrix  inverse  in 
(5.17)  exists,  the  linear-quadratic  differential  game  described  by  (5.15),  and 
with  PI  as  the  leader,  admits  a  feedback  Stackelberg  solution  given  by 

u*(t,x)  =  -  (Kl  Pl  +  K2  P2)  x  (5.21a) 

u*(t,x)  *  -  (Lx  Pl  +  L2  P2)  x  ,  (5.21b) 

where  {PL(t) ,P2(t)}  are  symmetric  solutions  of  the  coupled  set  of  Riccati 


equations 
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Px  -  -  PjL  -  L’P1  -  Q1  -(K^  +  K2P2),(K1PL  +  K2?2) 


+(L1P1  +  L2P2),Ri2  (K1P1  +  K2P2} 


(5.22a) 


+(K1P1  +  k2p2)  'r12  +  L2P2)  5  P1(T)=C1 


p2  -  -  p2l  -  l-p2  -  q2  -(l1p1  +  l2p2)'(l1p1  +  l2p2) 


+(L1P1  +  L2P2)'R21  (K1P1  +  K2P2)  (5.22b) 

+  (K1P1  +  K2P2)’R^  (L1P1  +  L2P2)  ;  P2(T)=C2 

L  A  A  +  bl  P  +  B2  P9  (5.23) 

Proof :  This  result  follows  by  substituting  =  -|x'P^x  into  the  PDE's  (5.19) 
and  observing  that  (5.22)  imply  satisfaction  of  (5.19)  by  such  a  quadratic 
cost-to-go  function.  Existence  of  a  (unique)  solution  to  (5.22)  when  T  is 
sufficiently  small  follows  from  a  standard  property  of  ordinary  differential 
equations  with  continuous  right-hand-sides.  c 

Remark  5.1:  Proposition  5.1  has  a  natural  counterpart  in  the  context  of 
feedback  Nash  equilibria,  simply  with  and  replaced  by  their 

corresponding  "hat'ted"  versions  introduced  earlier.  The  solution  to  the 
resulting  set  of  Riccati  equations  will  not  be  the  same  as  the  solution  to 
(5.22),  unless  R,^  =  0,  R^2  =0.  ~ 
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.2.  Linear-Quadratic  Differential  Gaines  with  the  Mode  of  Play  Determined 


bv  Markov  Jump  Processes 

Consider  the  class  of  deterministic  differential  games  wherein  the 
mode  of  play  is  determined  by  the  output  of  a  3-state  Markov  jump  process  with 
constant  parameters  A^(t,x)  *  A  ,  with  the  three  possible  states  corresponding 
to  the  three  different  modes  of  play:  Stackelberg  with  PI  as  leader  (i=l), 
Stackelberg  with  P2  as  leader  (i=2),  and  Nash  equilibrium  (i=3).  Hence,  in 
terms  of  the  terminology  we  have  adopted  for  our  general  formulation, 

I  =  {1,2,3},  =  {1},  S2  =  (2},^=  {3},  and  fr,  q^  and  are  independent  of 

r.  Then,  the  related  Hamilton-Jacobi  equation  that  yields  the  strong 
equilibria  is  (5.2)-(5.7),  with  A.,  independent  of  (t,x)  and  fi,  g* ,  g*  not 

K.  K 

depending  on  i.  We  now  study  these  equations  in  more  detail  for  the  special 
class  of  linear-quadratic  differential  games  described  by  (5.15)  but  without 
the  cross  terms  in  control  (i,e.,  *  0,  R21  =  0). 

First,  we  evaluate 


Tk  <c’x’V  =  arg  min  [  £  t(A3[-B1u1-B2u2) 


+  2xQi?C+  2^ 


-“k^k] 


^"k +  “s  <  -k  ^  > ' 


and  then  substitute  this  into  (5.2)-(5.5)  to  obtain,  after  performing  the 


minimizations , 


51 


52 


which  depend  on  the  observed  values  of  x(t)and  r(t),  denoted  by  x  and  i, 
tively.  The  following  proposition  now  summarizes  the  result. 


Proposition  5.2:  If  the  inverses  in  (5.26)  exist  and  the  coupled  set  of 
differential  equations  (5.27)  admits  a  symmetric  solution  in  the  interval 
[0,T],  the  linear  quadratic  differential  game,  wherein  the  modes  of  play 
are  determined  by  the  outcome  of  the  Markov  jump  process {r( •)} ,  admits  a 
strong  equilibrium  solution  given  by  (5.29).  D 


Remark  5.2:  If  the  cross  terms  in  (5.15b)  are  absent,  i.e. ,  R^^  =  0,  R^ 
it  follows  from  (5.26a)  and  (5.26b)  that  Nj^  and  N^—  are  independent  of  i 
which  implies  that  the  set  of  equations  (5.27)  are  also  independent  of  i. 
the  solution  (5.29)  does  not  depend  on  the  observed  value  i  of  the  Markov 


jump  process  { r  ( ■)} .  Furthermore,  since  is  independent  of  j  and 
(5.27)  becomes  the  same  equation  set  as  (5.22)  with  R^9  =  R^^  =  0. 


3 

Z  A 


The 


respec- 


=  0, 


Hence , 


=  0, 


implication  then  is  that  the  strong  equilibrium  solution,  for  this  special 
case,  is  identical  with  the  feedback  Nash  equilibrium  (or  equivalently,  the 
feedback  Stackelberg  equilibrium)  solution.  n 
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6.  Concluding  Remarks 

The  objectives  of  this  study  have  been  two-fold:  Firstly,  to 
provide  a  general  definition  of  an  equilibrium  solution  for  discrete-time 
dynamic  games  which  would  encompass  both  feedback  Nash  and  feedback  Stackelberg 
solutions  and  also  be  extendable  to  games  in  which  both  the  underlying 
system  dynamics  and  the  mode  of  play  are  determined  (nondeterministically) 
as  the  outcome  of  a  finite  state  stochastic  jump  process;  this  has  been 
accomplished  in  Sections  2  and  3  which  also  contain  the  optimality  equations 
for  such  games.  Secondly,  to  introduce  a  feedback  Stackelberg  equilibrium 
concept  for  (continuous-time)  differential  games  with  a  fixed  asymmetric 
mode  of  play,  and  to  obtain  the  associated  optimality  conditions;  this  has 
been  achieved  in  Sections  4  and  5  by  formulating  a  general  stochastic 
differential  game  with  structural  and  modal  uncertainties  and  by  associating 
the  (strong)  equilibrium  solution  with  the  limiting  solution  of  a  sequence 
of  discretized  (G ( 5 ) )  games.  This  has  led  to  an  indirect  derivation  of  a 
pair  of  Hamilton-Jacobi  equations,  which  characterizes  the  set  of  optimality 
conditions  for  the  differential  game. 

An  important  by-product  of  our  analysis  is  the  observation  that  the 
feedback  Nash  solution  is  not  in  general  the  same  as  the  feedback  Stackelberg 
solution,  in  continuous-time  dynamic  games,  unless  the  state  equation  and  the 
cost  functionals  are  of  particular  forms,  as  discussed  in  Section  5.  Hence, 
the  mode  of  play  is  a  crucial  factor  in  the  characterization  of  equilibria  in 
differential  games,  and  if,  for  example,  the  Nash  equilibrium  is  defined  as 
the  limit  of  the  equilibrium  solutions  of  a  sequence  of  discretized  (in  time) 
games,  the  end  result  will  very  much  depend  on  the  information  structure  to  be 
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adopted  in  the  solution  of  these  games  [i.e.,  whether  or  not  one  player  has 
informational  advantage  over  the  other  in  terms  of  observing  his  actions]. 

In  other  words,  in  nonzero-sum  differential  games  the  equilibrium  value 
cannot  in  general  be  defined  independently  of  the  information  structures  of 
the  sub-games  —  quite  contrary  to  what  has  been  a  common  practice  in  zero-sum 
differential  games. 

In  the  context  of  zero-sum  differential  games,  it  is  possible  to 
define  the  (saddle-point)  value  in  several  different  ways,  as  the  limit  of 
saddle-point  equilibria  of  a  sequence  of  discrete-time  sub-games  with  various 
information  structures  [see,  [10] , [14]— [17]  for  such  different  definitions, 
the  actual  difference  lying  in  the  information  allowed  to  the  players  in  the 
discretized  games].  Such  different  limiting  procedures  all  lead  to  the  same 
numerical  "value"  for  the  original  differential  game,  even  though  the 
existence  conditions  (for  the  limit)  are  different  under  different  schemes. 
Motivated  by  this  result  (which  is  an  inherent  property  of  the  saddle-point 
value  in  zero-sum  differential  games),  Friedman  has  attempted  in  [10,  Chapter  8] 
to  introduce  the  concept  of  "stable  (Nash)  equilibrium"  in  nonzero-sum 
differential  games  by  relating  it  to  the  limit  of  equilibria  of  discretized 
games.  In  each  such  discretized  game,  Friedman  has  adopted  a  strictly 
hierarchical  mode  of  play  at  each  stage,  with  the  players  moving  in  a 
predetermined  sequential  manner,  and  he  has  defined  the  "value"  as  one  which 
is  attained  in  the  limit  (as  the  discretized  games  converge  to  the  original 
continuous-time  game)  independently  of  the  order  in  which  the  players  act  at 
each  stage  of  the  discretized  games.  Furthermore,  he  has  shown  that  such  a  defin 
is  meaningful  in  the  case  of  a  special  class  of  linear-quadratic  games,  since 
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che  feedback  Nash  value  is  Chen  independenc  of  Che  order  of  play.  For  more 
general  cypes  of  nonzero-sum  differencial  games ,  however,  chis  will  no  longer 
be  crue,  as  can  be  concluded  from  our  analyses  and  results  of  Sections  4  and  5. 
Our  contention  is  that  the  equilibrium  value  will  in  general  be  dependent  on 
the  mode  of  play  adopted  for  the  discretized  versions  of  the  continuous-time 
nonzero-sum  game,  and  hence  the  stable  equilibrium,  as  defined  by  Friedman, 
will  exist  only  in  a  very  restricted  class  of  problems. 

Our  indirect  definition  of  "strong  equilibrium"  in  Section  4,  as 
well  as  the  derivation  of  the  associated  Hamilton-Jacobi  equation,  have 
bypassed  the  seemingly  difficult  task  of  proving  existence  of  a  limit  to  the 
sequence  of  G(6)  games  and  identifying  the  solution  of  the  Hamilton-Jacobi 
equation  as  the  limit  of  the  equilibrium  values  of  these  games.  This  is  a 
challenging  task  that  needs  to  be  undertaken  in  the  future  in  order  to 
complete  the  theory  presented  here.  An  extension  of  our  analysis  to  general 
M-player  nonzero-sum  games  vested  with  a  variety  of  modes  of  play,  however,  is 
rather  routine,  and  one  may  expect  to  arrive  at  similar  qualitative  conclusions 
from  such  an  extended  analysis;  we  have  chosen  not  to  do  it  here  in  order 
not  to  bury  the  essential  ideas  in  notational  complexity.  An  application  of 
the  theory  of  Section  4  and  some  of  the  results  of  Section  5  to  certain 
problems  arising  in  economics  can  be  found  in  [18]. 
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7.  Appendix  I 

In  this  appendix  we  formulate  a  2N-stage  stochastic  dynamic  game 
whose  feedback  Nash  equilibrium  solution  coincides  with  the  strong  equilibrium 
solution  of  §3.2.  In  this  regard,  this  result  can  be  viewed  as  an  extension 
of  Proposition  2.2  to  stochastic  dynamic  games  with  variable  nodes  of  play, as 
formulated  in  §3.1. 

Consider  the  following  state  equation  and  cost  functionals: 

t 

_  _  o  o  r  f  * 

State  Equation:  z(s+l)  =  ?s [z(s) ,u^(s) ,u2(s) ,w(s) ]  ;  z(o)  =  [x  ,r  ,o  ,o  ] 

where 


z (s )  +  [o^  ,  r^u^s)  ,o  )  ]  ;  s  even 

'  -  “k 


r (s)  e  S, 


Fs [z,ul,U2] 


m  =m  +1, 
o  o 


and  for  .<?  odd, 


z(s) 


;  s  even,  r(k) 


»  Iff 

[w  (s),  o  ,o  ]  ;  s  odd 

ml  m2 


P(w(s)  €  (dX,i)  |  z(s)  ,u1(s)  ,u2(s) )  =  Q[dX,i;y(s)  ,r(s)  .u^s)  ,u2(s)  ] ; 

r(s)  € 


Q[dX,i;y(s)  ,r(s)  ,T^uk(s-l)  ,u^-(s) ) 


r(s)  6  S, 


m  ’  m  x(m,+m0+l) ' 
o  o  1  2 


-S, 

r(y) 


s  even 


_  »  ti 

r  (s)  *  (o  ,  1,  om  ,  o  ) 2 ( s ) 
mQ  nij  m2 


s  odd . 


u,  (s-1)  *  (°  — »  if,  (0  I  ))z(s) 

tc  ^k'^o  *  mkxmk  "he 


Note  that 


•  s  •  g  •  •  » 

(  [x  (-r),  r  (-r) ,  o  » o  ]  ,  s  even 

1  2  2 


z(s)  d  < 


r(^~) ,  o  s  odd,  r(^i)  € 


\  [x  (^).r(^),  o  ,o  ],  s  odd,  r(^)  6^ 


and 


uk(!} 


s  even,  rCy)  € 


Uk(s)  "  < 


uk(Si)  s  oddi  r(Ei1)  €  sk 


Cost  Functionals:  k=l,2 


2N-1  „ 

J,  *  E,  { q,[  z  ( 2N )  ]  +2  g,  lz(s)  ,u  (s)  ,u.(s)  ]} ,  k*l,2, 
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where 


3k[z(2N)] 


V(I« 


mox(m1+m2+l) 


)  z(2N)  ] 


?k,s-l  t(Im  *°m  x(m.+m.+l))2(s),Ul(s),U2(s)J’  s  odd’ 
— -  ■  o  o  1  2 

r(s)  e  S, 


8k,s[2’Vu2] 


(  8k,S|l  [(Imo*°mox(m1+m2+l))2(s)'  \(\(s-l)  ,^(s) )  ] , 


s  odd,  r(s)  €  S— 


0  ,  s  even 


Admissible  Control  Laws: 
For  Pk  at  stage  s: 


'  Yk,s(z(s))’ 


;  if  r(s)  €  s  ,  s  even. 


'  void;  otherwise. 


or  F(s)  6  S^-,  s  odd 


Let  Fk  s  denote  the  corresponding  strategy  space  of  Pk.  Then,  the  counterpart 
(extension)  of  Proposition  2.2  in  this  case  would  be  the  following: 
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Proposition  7.1: 

If  {J.,  T.  ;  k-1,2;  n-0,1,. . ,N-ll  is  an  N-stage  stochastic 

tc  ic  j  n 

dynamic  feedback  game  as  defined  in  §3.1-§3.2,  admitting  a  unique  strong 

*  * 

equilibrium  solution  there  exists  a  2N-stage  stochastic  dynamic 

feedback  game  {J,  ,  F  ;  k*l,2;  s*0,l,..,  2N-2)  as  defined  above,  which 
tC  zC  9  s 

* 

admits  a  unique  feedback  Nash  (strong)  equilibrium  solution  (y-^^)- 
Moreover,  there  is  a  unique  correspondence  between  these  two  solutions  , 
given  by 


Yk,s  ([x(2K  °m1,0m^>  =  Y*#s(x(f),  i);  i  6  Sk,  s 


even 


"*  r  ,/S-l 


s-1 


,SCX(~)*  i’  irk(o»k’  Yk,szl(x("l_  •  i))] 


=  y!!  _  ,  (xC-^-1- ),i)  ;  i  6  Sr,  s  odd 


rk,s-l  v*'v  2 
2 


1,O"1,0”2)  *  1)!  is7?' 


s  even . 


s*0,l,. . ,2N-2  ;  k-1,2. 
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8.  Appendix  II 

*  1  "  "  ■** 


In  this  appendix  we  provide  a  proof  for  Lemma  4.1  of  Section  4, 
which  follows  the  lines  of  the  proof  given  in  [7]  for  Theorems  1  and  2  (see 
also  [8]),  with  the  only  difference  being  that  we  have  to  account  for  the 
dependence  of  A^  on  t  and  x  (whereas  in  [7]  and  [8]  A^'s  were  taken  as 
constants) . 

Let  u  =  (u^(t,x),  u^  (t,x))  £  t(-^  x  be  an  admissible  control  law. 


(t,x)  be  a  point  at  which  a  jump  to  r(t,x)=i  has  occurred  and  x1[s;t,x] 
denote  the  corresponding  state  trajectory  (s  _>  t) .  Then,  the  probability 
density  of  the  event  that  the  first  jump  of  r  after  time  t  is  from  i  to  j 
and  occurs  at  time  s  >  t  is  given  by 


A_(s,x  [s  ;  t  ,x] )  exp 


s 

[/ 

t 


A . . (a  ,x 
n 


[a;t,x])  da] 


(8.1a) 


and  the  probability  of  the  event  that  there  are  no  jumps  in  the  interval 
(t,s]  is 


s 

exp  [/  Aii(a,x1[o;t,x])  da],  (8.1b) 

which  follow  directly  from  the  transition  probabilities  (4.7)  and  (4.8). 

If  there  are  no  jumps  after  time  t  (until  the  terminal  time  T) 
the  cost-to-go  for  Pk  is  clearly  q^(xL[T;t,x]) ,  whereas  if  a  jump  from  i  to 
j  has  occurred  at  time  s,  t<s<T,  the  cost-to-go  from  s  onwards  is  (by 
definition)  V^(s jX1 [s ; t ,x] ) .  Hence,  using  (8.1a)  and  (8.1b),  satisfies 


the  integral  equation 
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Vl(t ,x)  *  q^(xX[T;t,x])  exp  [/  A  (a , x(a; t ,x])  do 


T  <  3  i 

+  Z  /  A  (s,x  [s;t,x])  exp  [/  A..(o,x  [o;t,x])  da 
3+i  c  t 


(8.2) 


.V^(s,xi[s;t,x])  ds 


Using  (8.2),  we  can  now  compute 


/  Aii(v,xi(v;t,x))  V*  (v,xi[v;t,x])  dv 


T  T 

q^(x1[T;t,x])  /  Aii(v,xi[v;t,x])  exp  U  ^ ± ± (a ; x1 [ a ; t ,x])  da]  dv 
t  v 


+  l  f  f  *.,(v,x  [ v; t ,x] ) A  [s ; t ,x]) 
jj«i  tv11 


o  •  *  i 

exp  [/  Aii(a;x  [a;t,x])  da]  V3  (s.x  [s;t,x])  ds  dv 


By  an  interchange  of  the  order  of  integration  in  the  double  integral  in 
(8.3)  we  get  for  this  term 


(8.3) 


Z  f  A  (s,x1[s;t,x])  V-i  (s,x1[s;t,x]) 
j/i  t  13  k 


s  s 

/  Aii(v;x1[v;t,x])  exp  [/  A  (o;x  [a;t,x])  da]  dv  ds 
t  v 


which,  after  integration  with  respect  to  v,  yields 


»  -  '.KHjwim 
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1  i  i  1  i 

I  /  X .  .  (s  ,x  [s;t,x])  (-1  +  exp  [/  A  (o;x  [o;t,x])  do])  Vr(s,x  [s;t,xj) 

J*i  t  1J  t  1  k 

while  the  first  term  on  the  RHS  of  (8.3)  is  equal  to 

■  •  T  i 

q^(x1[T;t,x])  (-1  +  exp  [/  A^o-.x  [o;t,x])  do]). 

Therefore,  by  (8.4),  (8.5)  and  using  again  (8.2)  we  find  that  (8.3)  is 
equal  to 

*.  .  T  . 

V*(t,x)  -  q^(xi[T;t,x])  -  Z  /  A  (s  ;xi  [s ;  t  ,x] )  vAs.x^sjty])  ds 
k  k  j^it  1] 

which  finally  yields  the  integral  equation  satisfied  by  V^(t,x): 

T  .  . 

V*(t,x)  =  q^(xi[T;t,x])  +  Z  /  Ai_.  (s,x1[s;t,x])V^(s,x1[s;t,x])  ds  . 


ds  , 
(8.4) 


(8.5) 


(8.6) 


(8.7) 


This  implies  that  V^(t,x)  is  piecewise  continuously  differentiable  in  t. 
We  thus  have,  by  (8.7)  , 


ds 


V^(s,xi[s;t,x]) 


3  s 


3Vk  dxX(s) 
3x  d  s 


I  A  (s,x1[s;t,x])V^(s,xi[s;t,x]) 

j  J  k 


Evaluating  this  at  s  =  t  we  obtain  (4.10).  " 

It  should  be  noted  that  Rishel's  Theorem  3  of  [7]  can  also  be  directly 
extended  to  this  class  of  stochastic  systems,  yielding  a  maximum  principle 
for  the  partial  differential  operator  of  the  form  (4.10).  = 
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Notation,  Terminology  and  Abbreviations 


I 


m 

0 


m  xm 
o 


1 


(mxm) -dimensional  identity  matrix 

(m  xm, ) -dimensional  zero  matrix  •  o  AO 
o  i  ’  mxm  =  m 


m 


E 

Pk 

U,. 


m 


U. 


k,n 


k 

^k 

^  k,n 


r 

Y 

ir 


k 

k 

k 


(a,b) 


Zero  vector  of  dimension  m. 
m-dimensional  real  line  (Euclidean  space) 

Player  k 

Control  set  of  Pk 

Control  set  of  Pk  at  stage  n,  for  the  discrete-time  game 

Set  of  open-loop  controls  of  Pk 

Open  or  closed-loop  control  of  Pk 

Set  of  closed-loop  feedback  controls  of  Pk 

Set  of  closed-loop  feedback  controls  of  Pk  at  stage  n,  for  the 
discrete-time  game 

Strategy  space  of  Pk 

Strategy  of  Pk 

(a,b)  if  k=l 

(b,a)  if  k=2 


k 


1  if  k=2 

2  if  k=l 


w.r.t. 


with  respect  to 


[0  ,T  ] 


Time  interval  on  which  the  differential  game  is  defined 


X 


State  space 


S 


k 


Set  of  indices  corresponding  to  the  states  of  the  jump  process  with 
asymmetric  mode  of  play  under  Pk's  leadership 


n 


Set  of  indices  corresponding  to  the  states  of  the  jump  process  with 
symmetric  mode  of  play 


I  A  Sl  +  S2  +  % 


<P  :  empty  set 
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Abstract:  This  paper  introduces  a  "Feedback  Stackel- 
berg"  solution  concept  for  continuous-time  multi-level 
dynamic  optimization  problem,  and  discusses  its  appro¬ 
priateness  for  decision  making  and  optimization  in  the 
presence  of  hierarchy. 
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i  1 ■  INTRODUCTION 

This  paper  introduces  a  "generalized  feedback 
Stackelberg  solution"  concept  for  continuous-time 
j  multi-level  dynamic  optimization  problems.  The  basic 
idea  is  to  relate  this  solution  concept  to  a  class  of 
feedback  equilibria  with  particular  information  struc¬ 
tures  . 

i  A  complete  and ‘formal  discussion  of  these  topics  is 
provided  in  Ref.  [1].  The  present  paper  is  devoted  to 
a  quick  presentation  of  the  main  results  proved  in  [1] 
and  a  first  discussion  of  the  possible  use  of  this 
I  solution  concept  in  the  modeling  of  economics  Imperfect 
competition  problems. 

In  section  2  the  Stackelberg  solution  concept  is 
:  discussed  in  the  realm  of  extensive  games.  It  is 
|  shown  on  simple  game  trees  that  the  Stackelberg  solu¬ 
tion  concept  is  in  fact  an  equilibrium  solution  asso¬ 
ciated  with  a  peculiar  information  structure.  The 
,  feedback  Stackelberg  solution  (FSS)  concept  for  multi- 
|  stage  game  is  then  discussed  as  a  natural  extension  of 
this  feedback  equilibrium  solution. 

Vlth  this  new  interpretation  of  the  FSS,  a  natural 
j  extension  is  to  let  the  leadership  change  with  time, 
j  This  leads  to  the  concept  of  "generalized  feedback 
Stackelberg  solution"  presented  in  section  3  for  a 
continuous  time  system  with  Jump  disturbances  affect¬ 
ing  the  structure  and  the  mode  of  play  of  the  game. 

1  The  link  between  the  multistage  and  differential  game 
formulation  is  more  fully  studied  in  [1]  using  the 
5-game  approach  proposed  by  Friedman  ([2]). 

This  new  solution  concepts  permits  one  to  formulate 

■  a  new  class  of  imperfect  competition  models,  where  the 
market  structure  changes  with  dominant  firms  risking 
to  loose  their  leadership.  This  is  discussed  in  sec- 

■  tion  4. 

’  2,  THE  STACKELBERG  SOLUTION  AS  AN  EQUILIBRIUM 

Consider  a  static  two  player  game  defined  by  the 
|  cost  functionals  U^xU,—  R,  k»l,2,  where  Ufc  13 

the  set  of  admissible  controls  for  player  k  (denoted 
Pk  hereinafter). 

I 


If  PI  la  the  leader,  he  announces  his  control  u^fU^ 
first,  to  which  P2  reacts  optimally  by  minimizing 
JjCu^.Uj)  over  u^fL^.  Assume  there  is  a  unique  map¬ 
ping  Tj!  such  that 


JjCu^TjCu^)- 


1  min 
u2€U, 

* 


J2(Ul,“2),  Vu^fU^. 


Then  a  pair  (u2>u2)  f  U^  x  U2  is  called  a  Stackel¬ 
berg  solution  for  the  static  game  with  PI  the  leader 

* 

u2  »  arg  min  J,  (u^T^u^ ) 


if 


VC1 


W- 


A  diagrammatic  representation  of  such  a  game  is 
given  in  Figure  1.  The  game  is  also  defined  as  the 
following  matrix  game. 


0 

1 

0 

1,1 

2,0 

1 

1,2 

0,4 

1,1  2,0  1,2  0,4 


Fig.  1.  Static  Game 


Here  one  has 

T2(0)  -  1  ,  .  T;(l)  =»  0. 

Therefore  Min  J^u^T^u^^MinfZ.l}  =  1 
U1€U1 

★  it 

and  the  pair  (u^,  u2)  •  (1,0)  is  the  Stackelberg  solu¬ 
tion  with  PI  the  leader. 

Now  consider  the  extensive  game  defined  by  the  tree 
and  the  information  structure  shown  on  Figure  2.  (For 
a  most  recent  presentation  of  games  in  extensive  form 
see  Ref.  [3]).  Next  to  Figure  2  the  matrix  game 
formulation  is  given.  This  game  in  extensive  form 
describes  the  sequence  of  two  moves:  first  PI  moves, 
then  P2  moves  knowing  the  control  chosen  by  PI. 
Therefore  the  decision  of  P2  is  now  described  by  a 
mapping  y2:  U^-Uj.  There  are  four  of  them.  (0,0)  is 

the  mapping  such  chat  y,(0)*0,  r2(l)=0,  (0,1)  is  the 
mapping  3uch  that  Y-,(0)»0,  y -,(!)=!  etc... 


I 


Fig.  2:  Case  in 

extensive  fora 


The  circled  cost  pair  (1,2)  in  the  bimatrix  game  is 
clearly  a  Nash  equilibrium.  The  associated  pair  of 

controls  (u^.u^  defined  by  u^-l,  u2”  Y2(l)  *  0  is 

also  the  Stackelberg  solution  previously  obtained. 
Therefore  there  is  a  possibility  to  interpret  the 
Stackelberg  solution  as  a  particular  equilibrium  in  a 
suitably  defined  game  in  extensive  form.  Notice  that 
there  may  be  many  other  equilibria  for  the  game  in 
Fig.  2  due  to  informationally  non  uniqueness  (see  Ref. 
(31).  What  has  been  illustrated  on  this  simple  exam¬ 
ple  could  be  generalized  to  more  general  games  such 
as  the  multistage  games  considered  by  Simaan  and 
Cruz  in  Ref.  [4a, bl.  One  can  prove  the  following 
proposition  (see  [1]): 

Proposition  1:  «If  an  N-stage  dynamic  feedback  game 
admits  a  unique  feedback  Stackelberg  solution  (in  the 
sense  of  Ref.  [4]  with  PI  the  leader,  there  exists  a 
2N-stage  dynamic  feedback  game  which  admits  a  unique 
feedback  Nash  equilibrium  with  a  unique  correspondence 
between  these  two  solutions. » 

This  result  permits  one  to  relate  the  feedback 
Stackelberg  solution  to  the  normal  form  description  of 
the  game  and  therefore  to  give  a  "good  definition"  of 
this  solution  concept  with  regard  to  the  theory  of 
games.  Furthermore  the  interpretation  of  leadership 
as  an  asymmetry  in  the  information  structure  leads  to 
the  consideration  of  dynamical  systems  with  varying 
leadership.  This  idea  will  be  further  pursued  in  the 
next  section  which  deals  with  continuous  time  systems. 

3.  CONTINUOUS  TIME  FEEDBACK  STACKELBERG  SOLUTION 

Consider  for  t€[Q,TJ  a  stochastic  system  of  the  form 
x  »  fr<C)(t,x,ultu2),  x€X,  k-1,2  (1) 

with  initial  condition 
x(0)  -  x°,  r(0)-r°. 

IB 

The  state  x  belongs  to  X£  R  °,  and  the  control 

takes  values  in  S  a”* ,  (k-1,2).  In  (1)  r(t)  is  a 

finite-state  stochastic  jump  process  which  takes 
values  in  I.  The  RHS  of  (1)  changes  from  f^-(-)  to 

f-(-)  as  r(t)  Jumps  from  i  to  J.  For  each  i€I,  f1  is 

w.r.t  all  its  arguments.  Let  U.  ,  k-1,2,  be  two 

*  i 

classes  of  admissible  control  laws  ^(t.x)  with  value 
in  defined  on  I  x[0,Tl*  X  such  that  u^(’)  is 
piecewise  continuous  in  t,  in  x. 

The  relationship  between  the  processes  x(t)  and  r(t) 
is  assumed  to  be  such  that  for  any  control  law  and 
almost  any  in  the  sample  space  (S2,F)  there  exists  a 
piecewise  constant  function  a  : [0,T]xX-I  3uch  that 

CO 

(x(t,a),  r(t,u))  satisfies 


M.a(t,x)  t±u(t.x>  4u(t,x) 

x  -  f  (t.x,^  (t,x),  u2  (t,x)) 

r(t,to)  -pu(t,x). 

It  is  also  assumed  that  for  any  admissible  control 
law  ufU^xUj  the  following  conditional  Markov  property 

holds 

P“(n(t+h,x(t+h))-j  |(i(t,x(t))-i,x(t)-x]» 

(t,x)h+o(h)  (2) 

Pu[n(t*h,x(t+h)«i  |p.(t,x(t))-i,x(t)-xl- 

l+X11(t,x)h+o(h)  (3) 

where  °^~  -  0  uniformly  in  x  and  u. 

For  any  admissible  control  law  u  let  V*  (t,x)  denote 
the  corresponding  value  of  the  conditional  expectation 

V^(t.x)-  E-£q£(T)(x(T))  j  x(t)  -  x,r(t)  -  1 j  <4> 


where  q£(0,  i€I,  k-1,2  are  C1  functions. 

The  system  (1) — (4)  is  a  slight  generalization  of  the 
one  studied  by  Rishel  in  (5) .  The  process  r(t)  models 
random  structural  changes  in  the  dynamical  system  to 
control.  The  two  controls  u^  and  u2  correspond  respec¬ 
tively  to  PI  and  P2,  and  one  assumes  that  the  mode  of 
play  of  the  dynamic  game  is  determined  by  the  outcome 
of  r(t).  More  precisely  I  is  assumed  to  be  partition- 
ned  into  three  subsets  I-S^S^N.  If  r(t)€Sk,  it  is 

said  that  Pk  is  the  leader  at  time  t.  If  r(t)€N,  there 
is  no  leader  at  time  t.  Consider  two  classes 
rk,  (k-1,2)  of  functions  Yk  [0,T)xXxIxU--Ck, 

where  k*2  if  k-1,  k»l  if  k-2,  and  such  that 
i€NUSk-Yk(t,x,i,u_)sr(t,x,i) .  (5) 

The  relation  (5)  expresses  the  fact  that  when  Pk  is 
not  the  follower  the  function  Yk  cannot  depend  on  u_. 

1  A 

The  functions  r,  are  assumed  to  be  C  In  x  and  u  . 

k  k 

It  is  possible  to  associate  a  control  law  u (U^*U^  with 

a  pair  by  defining 

u£(c,x)£  rk(c,x,i)  if  i€NUSfe 

( t ,x)  —  Yk(t,x,i,y_(t,x,l))  if  1  iS_ 
k  k 

A  pair  (y2 >Y2) x^2  Is  called  a  pure-feedback 

strategy  (PPS)  pair  if  the  control  law  defined  through 
(6)  is  admissible.  Associated  with  a  PFS  are  two  cost 
functions 

Jk(t,x,i;Y1,r2)  ^  E.rq£(T)(x(T))  | 

x ( t ) —x , r ( t ) — i i  (7' 


where  u  satisfies  (6). 


every  t«[0,TI,  lflthe  following  holds: 


J.  (T,E,i;y.  ,Y,)  s  J,  (-r,5,i;y  ,y  ) 

III  (8) 

■^2^*^*  ISY^.Y.*)  —  -^2  ^  *  £•  ^  JY^ 

for  any  PFS  pair  (y^y*)  or  (r*.r2)- 

This  is  Che  usual  definition  of  a  feedback  Nash 
equilibrium.  Now,  by  definition  of  y^,  Pk  has  Che 

opportunity  to  react  "locally"  at  t  Co  the  choice  of 
control  made  by  P^  if  this  is  the  leader  (r(t)£S^). 

Define  the  e-deviation  of  Pk  at  (c,x,l)  with  value 

uj  as  a  PFS  yD 

k;e,u£  such  that 

VTf[t,t*e],  if  r(a)*r(t)  for  4€[t,-r]th«n  uj(t)«u^, 
and  which  coincides  with  Y£  otherwise. 

Define  the  e-reaction  of  Pk  at  (c,x,l)  with  value  u, 

R  k 

as  a  PFS  y.  -  such  that 
K,e,uk 

Vtf[t,t+e],  if  r(4)*r(c)  for  af[t,T]  then  u)£(t)»u^ 

and  which  coincides  wlch  y,  otherwise. 

k 

*  * 

Definition:  A  PFS  pair  (y^.yj)  is  a  strong  equilibrium 

(or  a  generalized  feedback  Stackelberg  solution  (GFSS) 
If:  (i)  it  is  a  weak  equilibrium;  (it)  for  any 

(t,5,i)f  [O.TJxXxSjj.UjjfUj.u^FU^,  the  e-deviation  yj?.t  - 

R  * 

and  the  e-reaction  y,  _  are  well  defined  for  e  small 

:£,uk 

enough  and  the  following  holds: 

v  - 

J.  (t,;,i;y°  -  ,y5  -  )  ♦  o(e)  (9) 

K  K,C,U£  **  e*uV 

with  lim  o(e)  _  Q 
e— 0  e 

Remark  1:  When  PK  Is  the  follower  he  has  the  possibi¬ 
lity  to  adjust  to  the  choice  of  control  by  Pk.  In  a 
GFSS  this  adjustment  is  "locally  optimal". 

Remark  2:  In  Ref  (1)  the  differential  game  is  more 
precisely  defined  using  the  5-game  approach  of  Fried¬ 
man  It  is  shown  that  a  PFS  may  be  defined  as  a 
sequence  of  5-strstegies  used  for  G(5)  games  having 
the  structure  of  multistage  games. 

The  main  result,  proved  in  [1]  Is  the  following 
Hamilton-Jacobi  equation  characterization  of  a  GFSS. 

Proposition:  Under  assumption  A1-A5  (specified  in 
til),  a  necessary  and  sufficient  condition  for  a  PFS 

*  *  -* 
pair  (Yj^Yj).  generating  the  control  law  u  ,  to  be  a 

GFSS  is  that  for  each  id,  kfl,2,  the  cost-to-go  func- 

tions  V~(c,x)  defined  by  (4)  (6)  satisfy  the  following 

partial  differential  equations: 

M*"  jfl  vk(t’x)*fl  vk(t*x)  *1Ct,*.ttk,y5(t.x,i,uk))>* 

VLk  l  J 

j-j  Vt,x)vk(e,x>  ■  k  vk(t-x)  + 

V^(t,x)f1(t,x,u^  (t,x),  u^  ( t , x) )  + 

I  \  (t,x)  Vjj(t.x)  (10) 

jd  J  K 


with  the  boundary  conditions 

V*(T,x)  -  q^(x)  Id,  x€X  (11) 

and  with  the  additional  condition  that 

V(t,x)  €  [0,T!*X,  VI €  Sj.  Vu^,  Vu^fUj 

—  Vk(t,x)f  (t,x,(uj,  Yk(t,x,i,ujj)))  < 

V^(t,x)f1(t,x)f1(t,x,(u2,uk))  (12) 

k-1,2,  k-1,2,  k*k. 

Remark  3:  In  the  particular  case  where  there  is  no 
uncertainty,  with  one  player  Pk  the  leader,  Che  con¬ 
cept  of  GFSS  happens  to  be  the  continuous  time  counter¬ 
part  of  the  feedback  Stackelberg  soludon  concept  pro¬ 
posed  in  [41  for  multistage  systems. 

4.  APPLICATION  TO  A  NEW  CLASS  OF  IMPERFECT  COMPETI¬ 
TION  MODELS. 

4.1  Dominant  firms  and  leadership 

Von  Stackelberg  initially  proposed  the  solution  con¬ 
cept  associated  with  his  name,  as  a  description  of  the 
economic  warfare  between  two  firms  in  a  situation  of 
duopoly  (Ref.  [6Q.  He  insisted  on  the  fact  that,  in 
most  cases,  both  firms  would  desire  to  become  the 
"leader",  which  would  lead  to  an  impossible  equilib¬ 
rium. 

Some  markets  are  characterized  by  a  "dominant  firm" 
which  is  usually  the  "price  leader" ,  and  a  "competi¬ 
tive  fringe"  composed  of  small  firms  acting  as  "price 
takers".  The  Stackelberg  solution  concept  is  applic¬ 
able  to  such  a  situation.  Recently  several  attempts 
have  been  made  to  model  such  markets  using  a  differen¬ 
tial  game  approach.  Gilbert  [7]  proposes,  for  example, 
an  interesting  model  of  an  "OPEC-like"  cartel  exploi¬ 
ting  an  exhaustible  resource.  The  proposed  model 
makes  use  of  the  so-called  open  loop  Stackelberg  solu¬ 
tion,  and  it  is  therefore  assumed  that  the  cartel  an¬ 
nounces  Its  production  path  for  the  whole  future. 

Such  an  assumption  is  difficult  to  accept  without 
criticism  since  such  a  behaviour  is  seldom  observed 
(certainly  not  from  OPEC  members).  What  is  observed, 
in  the  case  of  a  market  dominated  by  a  firm  or  a  car¬ 
tel,  is  only  the  fact  that  the  leader  has  to  announce 
first  his  current  decision.  Non-OPEC  members  (e.g. 
Canada,  England,  etc...)  know,  at  each  Instant  of 
time  the  price  set  by  OPEC  members  and  they  can  adjust 
to  this  information.  What  they  don't  know  (and  the 
recent  impact  on  Canada  and  Mexico  economies  of  the 
oil-glut  shows  it)  is  the  path  of  oil  extraction  or 
the  path  of  oil  price  for  the  future. 

We  contend  that  the  generalized  feedback  Stackelberg 
solution,  with  its  interpretation  as  an  equilibrium 
associated  with  a  particular  information  structure, 
should  be  more  appropriate  for  the  analysis  of  such 
imperfect  markets. 

It  may  happen  that  the  Nash-feedback  solution  could 
leave  one  player  better  off  than  the  FSS  with  this 
player  che  leader.  However  this  is  perfectly  normal 
if,  being  the  "leader",  only  means  that  a  sort  of  in¬ 
formational  disadvantage  is  associated  with  the  domi¬ 
nant  position.  The  leader  has  to  act  first  and  thus 
gives  some  extra  information  to  the  follower. 

Furthermore,  by  letting  che  leadership  be  the  out¬ 
come  of  a  random  process,  the  displacement  of  leader¬ 
ship  from  one  player  to  another  can  be  modelled.  This 
would  eliminate  some  difficulties  associated  with  the 
desire  for  each  plaver  to  become  the  leader. 


(10) 


In  Che  next  sub-section  a  model  of  competition 
through  advertising  is  proposed  as  an  Illustration  of 
the  modeling  permitted  by  GFSS . 


4. 2  Profit  maximization  through  advertising 

Two  firms  are  competing  on  a  given  market  through 
their  advertising  expenditures.  It  is  assumed  that 
the  cocal  advertising  made  by  both  firms  together 
creates  the  demand  for  the  product  on  the  market, 
while  the  relative  values  of  advertising  per  dollar  of 
sale  at  a  given  time  determine  the  market  share  obtain¬ 
ed  by  each  firm.  It  is  finally  assumed  chat  the  margin¬ 
al  production  costs  are  constant  for  both  firms. 

The  state  of  this  system  at  time  t  is  supposed  to  be 
described  by  the  pair  (x(t),  r(t))  where  x(c)  2  0  is 
the  cocal  demand  for  the  marketed  product,  expressed 
in  dollars  of  sale,  and  r(t)  €{0,1,2}  is  an  indicator 
of  the  presence  of  a  dominant  firm  (r(e)  -  k  t  0  means 
chat  firm  k  is  dominant  and  acts  as  a  leader  at  time 
c;  r(c)  -  0  means  that  there  is  no  leader). 

The  control  of  each  firm  k  at  time  c  is  given  by 
u.  (c) ,  the  advertising  expenditure  per  dollar  of  sale. 

So,  if  che  firm  k  has  a  sales  level  of  x^(c)  at  time  c, 

then  its  cocal  advertising  expenditure  will  be 

ak(t)  -  u^Ct)  xk(t)  (13) 

In  order  to  describe  the  sharing  of  the  market  at 

1  2 

time  t,  consider  a  function  gJ :  R  -*  [0,11  defined  for 
each  value  j  €{0 ,1, 2  }  and  such  that 

XjU)  -  g'^Cu^t),  u2(t))  x(t) 

x2(t)  »  (l-g'^Cu^t),  u2(c)))  x(c) 

Notice  that  it  is  assumed  that  the  market  structure 
(i.e.  existence  of  a  dominant  firm)  can  affect  the 
market  sharing  mechanism. 

The  process  r(t)  is  assumed  to  be  a  Markov  chain 
with  continuous  jump  race  functions  ^ ( c) ,  i, j-0,1,2, 

while  the  sales  evolution  is  described  by 

-  (aGr(t)(Ul(t),  u2(t))-0)x(t)  £ 

aL1(t)gr(t)  (u^t) ,  u2(t)  )+u2(t)  (l-gr(t)  (uxCt)  ,u2Ct) ) 

x(t)  -  Px(t)  (15) 

Notice  that,  according  to  (13)-(14)  the  first  term 
in  che  R.H.S.  of  (15)  corresponds  to  a(a2(c)*a2(t) ) , 

and  therefore  is  proportional  to  the  total  advertising 
made  at  time  C.  The  constants  a  and  3  are  positive. 

The  equation  (15)  shows  an  exponential  decay  of  the 
sales  in  che  absence  of  advertising. 

At  time  t,  che  rate  of  profit  of  firm  k  is  given  by 

ryo  -  d-ck-uk(t))xk(t)  us) 


where  c^  is  che  constant  production  cost.  Each  firm 
wants  to  maximize  its  total  expected  profit  over  a 
fixed  time  period  [0,T] . 

The  equations  ( 13) — ( 16)  describe  a  dynamical  system 
having  the  properties  assumed  in  section  3. 


Using  the  necessary  and  sufficient  conditions  stated 
in  section  3  one  is  able  to  characterize  a  CESS 
(r*,r2)  by  solving  the  Hamilton  Jacobi  equations 


k*lu'x)  -  j0Vc)Vk(c’x) 


(a)  Max 
“k 


V^(c,x)(aGJ([uk,u|*(c,x))-3)x  ♦ 
(1-Cj[-Uj[)g^(  (Uj^.u^tt.x)  ]  )xj  if  j*k 


(b) 


Max 


[k  Vj[(t.*)(«GJ([uk.Tj(t.x.^ ,uk))-3)x  . 
u-vy^ty^ct.x,!^.  uk)j*j  if  j-k 


where 

TE(t'x4vE’  V  "  *r*  Mx[a7vk(t’x) 

“k 

(aG^Uu^.ujn-pjx  ♦  (1— c^— Ug)g^(  [u^.u^Dxj  . 

Here  one  has  used  the  notations  g^  for  the  function 

which  is  equal  to  gJ  if  k»l  or  to  l-gJ  if  k»2.  Also 
[u^.u^l  stands  for  the  pair  (u2>u2)  if  k«l  or  2.  The 

w  * 

control  laws  u,  (t,x)  are  the  control  laws  generated  by 
*  *  * 

(Yl,Y2)  as  shown  in  section  3. 


A  solution  to  these  equations  can  be  found  in  the 
class  of  strategies  defining  control  laws  which  are 
Independent  of  x.  To  show  this  assume  Chat  the  func¬ 
tions  Vjj  have  the  particular  affine  form 

Vjj[(t,x)  -  ^(c)  +  m£(t)x  (17) 

and  therefore  satisfy 


f^k(t’x)  “  "k(t)  (18) 

substituting  in  the  R.H.S.  of  che  Hamilcon-Jacobi 
equations  one  sees  that  x  factorizes  in  these  expres¬ 
sions  leading  to  the  following  conditions  on  u^, k-1,2 


u^>arg  max 


"k 


m^(t)(=Gj([uk,^*(c,x)])-3) 

a"Vuk>«k(tVaT<t*x>1)] 


if 


j*k 


u^-arg  max 
“k 


with 


o^(t)(aGJ([uk,T^(t,uk)l)-3) 

(1'Vuk)8k(tuk*Ti(t-\)1>3 


u^)*arg  max 

“E 


miuMcG^^.vy)-?) 

(1’VuE)8Eauk*uE])] 


if 


j*k 


It  appears  that  a  solution  to  these  conditions  can 
be  found  in  the  class  of  control  laws 


u£(t,x) 


s  sj(c> 


3-0,1, 2;  k-1,2 


It  would  be  now  straightforward  to  verify  that  such 
a  control  law  is  compatible  with  the  affine  structure 

(17)  for  the  functions  V^(t,x),  yielding  eventually  to 

^(t)s0,  J-0,1,2;  k-1,2,  and  J(t) ,  j-0,1,2;  k-1,2 

obtained  as  the  solucion  of  a  set  of  non  linear  dif¬ 
ferential  equations. 

This  characterization  of  an  equilibrium  is  obviously 
reminiscent  of  Refs.  [8)-[li;  dealing  with  a  similar 
structure  in  various  economic  models  based  on  an  opti¬ 
mal  control  or  differential  game  paradigm. 


I 

4 

i 

l 


When  che  tlae  horizon  T  tends  to  infinity,  end  if 
the  profits  are  discounted,  it  is  possible  to  obtain 
a  solution  of  Che  Hamilton-Jacobi  equations  in  the 
class  of  stationary  strategies.  In  this  particular 
model  this  would  lead  Co  a  strong  equilibrium  obtained 
in  che  class  of  piecewise  constant  control  laws 

u  (c,x)  s  u  i  ■  0,1,2. 

A  consequence  of  the  linear  structure  of  the  func¬ 
tions  will  chen  be  Chat  Che  leader  will  always  be 

better-off  by  using  his  leadership  position  (and 
announcing  his  current  control)  than  by  playing  ac¬ 
cording  to  the  usual  feedback  Nash  solution. 


A  detailed  proof  of  this  result  is  left  for  a  forth- 
|  coming  paper. 


5.  CONCLUSION 


In  this  paper,  che  feedback  Stackelberg  solution  has 
•  been  given  a  precise  definition  as  a  particular  equi¬ 
librium  in  a  game  of  extensive  form  having  a  special 
information  structure.  By  using  a  limiting  process  3 
la  Friedman  a  similar  definition  is  given  for  a  FSS 
I  in  a  differential  game  with  jump  disturbances. 

In  Ref.  [1]  the  case  of  Linear-Quadratic  differential 
games  is  fully  created  and  it  is  shown  chat,  in  the 
.  presence  of  cross  terms  in  the  cost  functions  invol- 
1  ving  the  controls  of  both  players,  the  FSS  is  differ¬ 
ent  from  the  usual  feedback  Nash  equilibrium. 
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Abstract.  This  paper  discusses  an  extension  of  the  currently  available  theory 
of  noncooperative  dynamic  games  to  game  models  whose  state  equations  are  of 
order  higher  than  one.  In  a  discrete-time  framework  it  first  elucidates  the 
reasons  why  the  theory  developed  for  first-order  systems  is  not  applicable  to 
higher  order  systems,  and  then  presents  a  general  procedure  to  obtain  informationally 
unique  Nash  equilibrium  solution  in  the  presence  of  random  disturbances.  A  numerical 
example  solved  in  the  paper  illustrates  the  general  approach. 

Key  Words.  Dynamic  games,  noncooperative  differential  games,  Nash  equilibrium 
solution,  uniqueness  of  equilibria,  second-order  systems,  stochastic  dynamics, 
closed-loop  information  pattern. 
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1.  Introduction 

It  is  by  now  an  established  fact  in  the  theory  of  dynamic  games  that 
redundant  information  leads  to  nonuniqueness  of  noncooperative  (Nash)  equilibria, 
giving  rise  to  so-called  informationally  nonunique  equilibrium  solutions  (Ref.  1). 
This  is  true  in  the  case  of  both  differential  and  discrete-time  deterministic 
games,  under,  for  example,  the  closed-loop  information  pattern  for  at  least  one 
player,  which  includes  memory  (that  is,  knowledge  of  past  values  of  the  state 
variable) .  One  way  of  removing  informational  nonuniqueness  in  deterministic 
dynamic  games  with  closed-loop  information  patterns  is  to  "robustify"  the  state 
equation  by  including  a  zero-mean  additive  noise  term,  such  as  (in  the  case  of  a 
discrete-time  state  equation  with  two  players) 

Xk+1  =  ^’WV  +  wk  (1-L) 

where  wfc  is  the  noise  term  accounting  for  the  inaccuracies  in  the  modelling. 

Here,  x  ,  u  ,  v  are  the  state  variable  and  the  control  variables  of  plavers  1 
and  2,  respectively,  and  {w^}  is  a  sequence  of  i.i.d.  random  vectors  of  the  same 
dimension  as  x^  (say,  n)  and  with  probability  distribution  that  assigns  positive 
probability  mass  to  every  open  subset  of  ]Rn.  With  the  inclusion  of  such  a  term 
in  the  state  equation  we  know  that  informational  nonuniqueness  disappears  and  the 
Nash  equilibrium  solution  becomes  more  meaningful  (Ref.  1). 

Another  method  to  remove  informational  nonuniqueness  in  Nash  equilibria 
is  to  restrict  the  solution  concept  further  to  "delayed  commitment"  type  Nash 
equilibria,  which  leads  to  the  so-called  feedback  Nash  equilibrium,  and  this  is 
free  of  informational  nonuniqueness  (Ref.  1).  But  both  these  methods,  as  discussed 
above,  and  the  verification  of  informational  nonuniqueness  of  equilibria 
assume  that  the  dynamics  are  described  by  first-order  (differential  or  difference.! 
equations.  However,  this  is  not  an  exhaustive  class  of  dvnanic  games,  since 
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there  exist  models  in  both  engineering  and  economics,  whose  dynamics  are  initially 
described  by  equations  of  second  or  higher  orders.  In  this  paper,  we  study  a  class 
of  such  dynamic  games  described  in  the  discrete-time  domain  and  present  a  number 
of  results  which  shed  light  on  the  properties  (such  as  existence,  uniqueness, 
solvability)  of  Nash  equilibrium  in  dynamic  games  whose  initial  dynamics  are  not 
of  first  order. 

In  the  next  section,  we  will  provide  a  mathematical  formulation  of  a 
class  of  two-person  nonzero-sum  dynamic  games  whose  state  equations  (in  discrete¬ 
time)  are  of  second  order,  and  information  pattern  is  closed-loop  for  both  players. 
Using  this  class  as  a  prototype  model  we  will  show  why  it  would  not  be  possible 
to  reformulate  such  higher  order  state  dynamics  as  first  order  equations  and  utilize 
the  currently  available  theory  of  dynamic  games.  Then,  in  Section  3  we  will  present 
and  discuss  a  procedure  which  would  iteratively  obtain  the  Nash  equilibrium  solution 
of  the  class  of  problems  formulated  in  Section  2,  and  verify  that  it  is 
informationally  nonunique.  This  procedure  will  then  be  illustrated  on  a  numerical 
example  in  Section  4,  which  will  lead  to  a  unique  Nash  equilibrium  solution.  The 
paper  ends  with  the  concluding  remarks  of  Section  5. 

2.  A  Class  of  Games  with  Second-Order  Dynamics 

Assume  that  a  game  process  evolves  according  to  the  second-order 
difference  equation 

xk+2  =  f(k,xk+1,xk,uk,vk)  +  wk  (2.1) 

xq,x^  given,  k=0,l,... 

where  x,  is  the  n-dimensional  state  variable  at  the  discrete  time  instant  k,  u, 

K. 

and  vk  are  the  r^  and  r0  dimensional  control  variables  of  Players  1  and  2, 
respectively,  and  -w^}  is  a  zero-mean  i.i.d.  (independent  identicallv  distributed) 
sequenceof  n-dimensional  random  variables  with  a  probability  distribution  that 
assigns  positive  probability  mass  to  every  open  subset  of  H*1 . 
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Let  us  take  the  information  pattern  for  both  players  to  be  closed-loop: 

nk  =  txk,Xk-l’ ‘ - * ,xl,Xo}  ’  k=0-1 .  (2,2) 

which  generates  a  finite  dimensional  space  for  each  k.  The  strategies  of 
Players  1  and  2  are,  respectively, 


^  ;  "k*®"1  ;  \  :  hr*'2 

at  time  instant  k,  which  are  Borel-measurable  functions.  Hence, 

\  =  W  ;  vk a  Bk(nk)  • 

Finally,  the  cost  function  of  player  i  is  taken  to  be 

K-l  . 

J^(y,8)  =  E  {  q 1  (  x^ )  +  Z  g1(k,xk,uk,vk) } 


(2.3) 


(2.4) 


k=0 


(2.5) 


where  uk  and  vk  are  given  by  (2.4), 

Y  =  {Y0,...,Yr_1}  ,  8  =-(3o - ’SK_l'  ’ 

K  is  some  positive  integer  (could  be  "infinite") ,  q1  and  g1  are  functionals  which 

are  continuously  differentiable  in  their  arguments  (except  k) ,  and  E 

K-7 

denotes  the  expectation  operator  over  the  prior  statistics  of  {w  } 

tC  K—  U 

*  * 

Let  us  recall  that  a  pair  (y  ,S  ),  belonging  to  an  admissible  class,  is  a 
Nash  equilibrium  if  and  only  if  the  pair  of  inequalities 

*  * 


Jl(y  ,8  )  <  J1(y,8‘ )  (2.6a) 

J2(Y  ,3  )  <  J2(y  ,S)  (2.6’d) 

hold  for  all  admissible  y  and  S. 

Now,  in  order  to  utilize  the  available  theory  for  first-order  systems 
in  order  to  obtain  the  solution  of  this  class  of  dynamic  games,  one  would 
immediately  be  tempted  to  reformulate  this  problem  by  introducing  a  2n-dimensiona] 


variable 
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1*  2’  ' 

(yk  ,yk  ) 


where 


1 

=  *1. 


yk  =  Xk+1 


(2.7) 

(2.3a) 

(2.8b) 


and  thus  increasing  the  dimension  of  the  state  by  a  factor  of  two.  Then,  y^ 
satisfies  the  first-order  difference  equation: 


1  2 
yk+l  =  yk 

(2.9a) 

yk+l  =  f(k’yk’VVvk> 

+ 

Wk 

(2.9b) 

■V 

yk+l  =  f(k*yk*uk’Vk)  + 

i ! 

Wk 

(2.10) 

where  the  definition  of  f  should  be  obvious.  Furthermore,  the  cost  functions 
could  be  expressed  in  terms  of  the  new  state  variable: 


^  =  E(q1(yK)  +  Z  g1(k,yk,uk,vk) }  (2.11) 

k=0 

for  some  appropriate  q1  and  g^. 

Even  though  the  above  formulation  appears  to  be  in  the  standard  (first- 
order)  form  of  a  dynamic  game,  there  are  in  fact  two  pitfalls  (in  disguise) 
which  render  the  available  theory  inapplicable: 

1)  Because  of  the  original  information  structure,  the  controls  are  not 
allowed  to  depend  on  all  components  of  y,  but  only  on  the  first  block  component  y  , 
that  is 


uk  =  vk(Vyk-l'-"V 

and  similarly  for  vk-  Hence,  the  original  perfect  state  information  problem  has 
been  turned  (through  reformulation)  into  a  stochastic  dynamic  game  with  partial 
state  information.  Such  problems  are  in  general  very  difficult  to  solve,  and 
currently  there  is  no  general  theory  which  would  be  applicable  in  this  framework. 


i 
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2)  The  noise  term  in  (2.10)  does  not  directly  affect  all  components 
of  the  state  vector,  and  hence  the  sufficient  (and  generically  necessary) 
condition  for  informationally  unique  equilibrium  (cf.  Ref.  1)  is  not  satisfied. 
This  indicates  that,  even  though  we  were  able  to  obtain  a  solution  for  the  first- 
order  model  (stochastic  dynamic  game  with  partial  state  information)  formulated 
in  this  section,  we  would  not  be  able  to  conclude  that  there  was  no  other  solution 
which  would  also  constitute  a  Nash  equilibrium. 

Thereby,  we  abandon  the  above  first-order  model,  and  seek  to  develop  a 
method  of  derivation  for  the  second-order  model  originally  formulated  in 
Section  2. 


3.  A.Jirect  Iterative  Met ho d  o f  Derivation  and  Exis_tence  of  Informationally 
Unique  Equilibrium  ~ 


What  we  intend  to  show  in  this  section  is  that  the  problem  formulated 
in  Section  2  indeed  admits  a  solution  which  can  be  obtained  (at  least  in  principle) 
by  a  careful  iterative  argument.  It  turns  out  that  there  is  no  informational 
nonuniqueness,  and  the  solution  depends,  in  general,  not  only  on  the  current 
values  of  the  state  but  also  on  the  entire  past  history. 

Towards  this  end,  let  us  first  assume,  without  any  loss  of  generality, 
that  g1(K-l , . , . , )  in  (2.5)  depends  only  on  x^_2  but  not  on  u^_^  and  v^  . 
Furthermore,  let  (y,S)  be  a  Nash  equilibrium  solution.  With  the  dynamics  evolved 
up  to  time  K-2  under  this  set  of  equilibrium  solutions,  let  us  isolate  the  game 
from  that  point  onwards  and  see  how  the  dependence  of  (y  ? , -j._?)  is  on  the 
previously  adopted  policies.  This  new  (reduced)  game  will  have  cost  functions 
.K-2, 


Ji  ^'<K- ->•={(_?)  =  E{qX(xK)  +  g1(K-l,xK_1)  +  g1(K-2,xK_.?,uK_2,vK_>)  -  i=l,2  (3.1 


in  which  x  and  x  .  can  be  expressed  through  (2.1)  in  terms  of  x  _,,u  , , 

K.  K,— I  N.- -  K.-  - 


x,,  ,,u„  ,,v„  -,w,r  ,  and  w„  ..  Such  a  substitution  then  leads  to,  for  some  h*  , 
K-3  Pv-3  X-j  K-i  X-J  K-_ 


whose  exact  form  will  not  be  given  here  but  can  easily  be  determined. 
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Ji  (',K-2’2K-2) 


where 


E{hK-2(xK-2,UK-2’VK-2’XK-3’UK-3,VK-3,WK-2,WK~3) 1 


(3.2) 


UK-3  "  ^-3^11-3*  VK-3  =  SK-3(nK-3:' 
and  expectation  is  over  the  statistics  of  w  and  w  . 

K.— 4 

Since  x  and  x  do  not  depend  on  w  and  w  ,  we  could  first  take 

K.~4  K.”  J  K- j 

expectation  of  (3.2)  over  w^_2  and  w^,_ ^  >  and  then  minimize  the  resulting  expression 
over  u  for  i=l  and  with  v  fixed,  and  over  v  for  i=2  with  u  0  fixed.  This 

K"*  4  tv*"4  K“4  K“4 

leads  to  expressions  of  the  form 


.•here 


UK-2  =  '^K-2(vK-2’PK-2) 
VK-2  =  '^K-2(uK-2,PK-2) 


pK-2  ~  (xK-2,xK-3’uK-3’vK-3) 


(3.3) 


(3.4) 


Let  us  assume  that  the  minimization  problems  above  have,  in  fact,  led  to  unique 
solutions,  and  let  us  further  assume  that  the  set  of  simultaneous  equations  (3.3) 
admits  a  uniaue  solution: 


UK-2  =  VK-2(PL-2) 


'k-2  =  =K-2(PK-2') 


(3.5) 


[Note  that  this  uniqueness  is  "structural,"  but  not  necessarily  "informational," 
since  we  are  solving  basically  a  static  problem. ) 

Hence,  the  bottom  line  of  this  analysis  and  discussion  is  that  if  (y,£)  is  a 
Nash  equilibrium  solution,  we  necessarily  have  the  relationship 


YK-2^K-2^  YK-2^pK-2^ 

“K-2  ^ n rC  —  2 ^  =  2K-2^pK-2^ 


(3.6) 
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where  y,6  are  as  determined  above  (by  (3.5)),  p^_,,  is  given  by  (3.4)  and 


uK-3  =  YK-3(nK-3) 


vK-3  =  BK-3(nK-3) 


(3.7) 


Now  the  question  is  whether  this  solution  is  "informationally  unique." 
It  would  have  been  informationally  nonunique  if  we  were  able  to  express  x^  ^ 
and/or  x^,_^  in  cerms  of  the  values  of  the  state  variables  at  earlier  stages 
[see  the  argument  in  Ref.  2  for  the  case  of  a  first-order  model].  But  this  is 
not  possible  here  becasue  of  the  presence  of  the  noise  term  in  the  state  equation 
(2.1).  Hence,  the  structural  form  (3.6)  is  informationally  unique  at  stage  K-2 . 
Of  course,  to  complete  the  description  of  (y  we  need  expressions 

for  (YK_3 >Sk_^) ,  which  we  do  next. 


We  now  substitute  (3.6)  into  (3.2)  to  obtain 

^i  (YK-2’^K-2)  =  °lK-2(xK-2’XK-3,UK-3,VK-3) 


(3.8) 


where  aK_2  is  some  function  with  the  given  arguments.  To  determine  the  static 
stochastic  game  at  stage  K-3,  we  first  start  with 


Ji  ^YK-3’SK-3^  =  E^aK_2^')  +  §  ^K~3’xK-3,UK-3,VK-3^ 

and  then  express  in  terms  of  ,x^_^  ,uk-4  ,vk-4  ,wK-4  ’  to  °*3tain 


(3.9) 


Ji  EthK-3(xK-3’UK-3,VK-3,UK-3,VK-3’XK-4’uK-4,VK-4’WK-4)  }  (3.10) 


where 


^-4  =  YK-4^nK-4)  ’  VK-4  SK-4(’1K-4) 


and  expectation  is  over  the  statistics  of  w^,_^ .  Note  the  presence  of  also 
u^  ^  ant*  3  in  (3.10)  which  come  from  a^_7. 

Now  we  solve  for  the  Nash  equilibrium  solution  of  the  static  game 
described  by  (3.10)  after  averaging  over  w  Onlv  u,,  and  v  are  the 

In.--4*  Pv“  J  J 


variables  here,  and  assuming  that  a  unique  Nash  equilibrium  solution  to  this 
static  game  exists,  it  will  be  in  the  form, 
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uK-3  *  YK-3(pK-3,UK-3,VK-3)  i 
VK-3  =  YK-3(pK-3’aK-3,VK-3)  j 

where 

P  K-3  =  (xK-3,XK-4’UK-4,VK-4) 


(3.11) 


(3.12) 


Because  of  consistency,  we  should  have  =  uj<  3 »  v[(  3  =  \  3  >  and  hence 

using  this  in  (3.11)  and  solving  for  (u  ,v  ,. )  we  obtain,  for  some  appropriate 

VK-3,3K-3’ 


UK-3  YK-3fPK-3) 


V  =  0,  ( n  )  ' 

K-3  “K-3  FK-3; 


(3.13) 


which  replaces  (3.5).  Therefore,  if  (v,£)  is  a  Nash  equilibrium  solution,  at 
stage  K-3  we  necessarily  have  the  relationship 


YK-3(\-3)  ; 

’K-3 (pK-3 

3  (n  )  - 

"K-3 ' "k-3  '  - 

“K-3VpK-3 

Furthermore,  this  is  the  informationally  unique  Nash  equilibrium  because  of  the 
argument  made  in  the  derivation  at  stage  K-2 .  The  conclusion  then  is  that  the 
Nash  equilibrium  solution  at  stage  K-3  depends,  in  general,  on  xM  , ,  x.T  and 
the  values  of  state  at  earlier  stages,  through  y  ,  and  2  ,.  This,  in  turn, 

implies  for  stage  K-2 ,  using  (3.6),  that  v  and  i  _  depend,  in  general,  on 

K.-*  4 

x.r  ~,x  , ,x  ,  and  the  values  of  the  state  at  earlier  stages  through  y„  ,  and  i. 

N-C  N-J  K.-U  K-“  f 

This  procedure,  applied  at  stage  K-3,  can  be  followed  up  for  other 
stages  in  a  descending  order  and  iteratively, to  obtain  finally  (y  =o), 
the  pair  of  equilibrium  strategies  at  time  K=0.  Then,  moving  in  forward  time, 
the  entire  set  of  equilibrium  strategies  can  be  determined  by  using  relations 
like  (3.14)  and  (3.6)  through  recursive  substitution.  This  entire  procedure 
will  be  illustrated  in  the  next  section,  using  a  numerical  example. 
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The  conclusions  that  can  be  drawn  ac  this  point  are  that:  (i)  the 
dynamic  game  of  Section  2  generically  admits  an  equilibrium  solution;  (ii)  this 
equilibrium  solution  can  be  derived  (in  principle)  using  an  iterative  procedure 
which  sweeps  the  time  interval  twice  (once  forwards  and  once  backwards)  — 
in  that  sense  the  method  is  in  the  spirit  of  the  one  employed  in  [3]  in  a  different 
context;  (iii)  the  equilibrium  solution  does  not  exhibit  informational  nonuniqueness. 


4.  A  Numerical  Example 

As  an  illustration  of  the  procedure  presented  in  the  previous  section, 
we  consider  here  a  4-stage  dynamic  game  with  scalar  dynamics 

*k+2  =  2xk+l  “  xk  +  uk  +  vk  +  wk 

where  xq  and  x^  are  specified  a  priori,  and  {w^}  are  zero-mean  i.i.d.  Gaussian 

random  variables.  The  information  structure  is  closed-loop  perfect  state  for 

both  players  and  the  cost  functions  are 

,  2 

J  =  E{  [x ,]“  +  I  [u  ]  } 

1  4  k=0  * 

2 

J,  =  E{ [x, ] 2  +  Z  [v.  ]2} 

Z  4  k=0  K 


LeC  {uo  =  Yo(xo}  ’  U1  =  Y1(V-V  *  u2  =  Y2(xo’XrX2} 
v0  =  3o(xo)  ,  vx  =  01(xQ,x1)  ,  v2  =  S2(xo,x1,x7)} 

be  a  Nash  equilibrium  solution  for  this  game.  Then,  following  the  procedure  of 

Section  3,  we  have: 

Stage  2:  Holding  u  ,u, ,v  ,v  fixed  as  given, for  {u_,v»}  to  be  in  eauilibrium  with 

0  10  1  4  Z 

these  it  is  necessary  and  sufficient  (sufficiency  follows  from  strict  convexity 
of  and  J7)  that 


3^  {E{tx4]2  +  [U2]2}  =  0 
"  (E{  [x,  ]2  +  [v7]“}  =  0  i 


(  ^  .  1 ) 


3.Z 


\  U 
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x3  =  2x2 

-Xl  + 

"l 

+ 

*1 

+  w 

x4  =  2x3 

‘  X2  + 

U2 

+ 

V2 

+  w 

using  the  fact  that  {w^w^}  is  an  independent  sequence,  we  obtain  for  (4.1) 
u2  =-(1/2)[3x2  -  2xx  -  2u1  -  2vx)  -(1/2)  v2 

v2  =-(1/2)(3x2  -  2x:  -  2G1  -  2v11-(1/2)u2  , 
and  solving  for  {u2<v2)  we  further  obtain  the  unique  relationship  (which  is  the 
counterpart  of  (3.5)): 

u2  *  >2^xl,x,,’ui,vl^  =  ”^1/,3^3x2  '  2xl  "  2ul  "  2vl^ 
v 2  =>  B2(x1,x2,u1,v1)  = -(1/3)  I3x2  -  2x1  -  2u1  -  2v1] 

Hence,  if  (y,S)  is  a  Nash  equilibrium,  we  necessarily  have  at  stage  k=2, 


(4.2) 


y2(x,,x1,XQ)  3  •y2(x^,x,,u1>v  ) 


30(x0,x,,x^)  S0(x, ,x0,u1  ,vn ) 


i 


(4.3) 


^'^’"l^o'  -2'  l’"2’T'l'  j 

and  this  is  also  a  unique  representation  because  we  cannot  express  x?  in  terms  of 
Xj  and  xq  without  introducing  an  error.  Note  that  we  still  have  to  determine 
’‘U1,V1‘  C°  comP^-eCe  the  description  of  y,  and  . 

Stage  1.  Here  we  take  u.  and  v  as  given  by  (4.2)  and  hold  (u  ,v  )  fixed.  Then, 
**  £  L,  o  o 

to  obtain  expressions  for  and  ts^,  we  substitute  (4.2)  into  the  state  equation 

for  x^  and  x^  and  x2>  and  differentiate  the  resulting  expressions  for  and 
with  respect  to  u^  and  v^ ,  respectively.  This  leads  to  expressions  for  u^  and  v^ 
in  terms  of  ,xq , u^ , v^ ,uq , vq  [this  is  the  counterpart  of  (3.11)3,  and  requiring 

consistency  in  the  solution  we  let  u^  =  u^,  v^  =  v^,  to  obtain  the  unique 
solution  (as  counterpart  of  (3.13)) 

U1  =  VWVV  =  '  (6/“3)  [ ( 4 / 3 ) x x  -  xo  +  Go  +  -:o]  , 


:'l  =  3l(xl’VVV  =  '  (6/43)[(4/3)xx  -  xq  + 


U  +  V 

o  o 
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The  uniqueness  here  follows  again  from  strict  convexity  of 
and  Hence,  if  (y,f3)  is  a  Nash  equilibrium  solution,  we  necessarily  have 


Wxo} 


wwv 


(4.5) 


WV  -  Sl(xl’WV 

and  this  is  a  unique  representation  because  x^  does  not  depend  on  xq. 

Note  that  still  this  is  not  in  final  form  because  of  dependence  on  the 
Nash  equilibrium  policies  at  stage  0,  (uo,vq).  Though,  one  conclusion  we  can 
arrive  at  here  is  that  if  (u^v^  is  unique,  then  2 ]_ )  be  unique  via 

(4.5)  and  (4.4),  and  in  turn  (72*62)  will  be  unique  via  (4.3)  and  (4.2). 

Stage  0.  To  obtain  the  expressions  for  (U0»VQ)>  we  take  (u^,v^)  as  given  by 
(4.4),  and  (u2,v?)  as  given  by  (4.2),  with  (u^.v^)  substituted  from  (4.4);  then  we 
evaluate  x^,  u9  and  v9  to  obtain 

x.  =  (4/43)x,  -  (3/43)x  -  (40/43) [u  +v]+u  +  v  +w+  2w,  +  w 

4*  1  o  oooozio 

u9  =  v9  =  -(63/43)x1  4-  (51/43)x  -  u  -  v  -  w  -  (8/43) [u  +  v  ] 

1  Z  1  OOOO  OO 


Then,  performing  the  optimizations 


min  E{[x^]“  +  [ u^ ] 2  +  [u^]^  +  [uq]2} 
o 

min  E{[x^]2  +  [v2]2  +  [v1]''  +  [vq}2} 
o 


we  obtain  the  unique  relationships  (note  that  u,  does  not  depend  on  ,  and 


does  not  depend  on  v  ) 


2(x.  -  u9)  +  2u  =  0 
4  2  o 


2(x4  -  v2)  +  2vo 


(72/43)x.  - 

(54/ 43)x 

+  3u  r  2v 

-  (32/43) [u  +  v  ]  =  0^ 

1 

0 

0  0 

0  0 

(72/43)x,  - 

(54/43)x 

O 

+  3v  2u 

0  0 

-  (32/43) [u  +  v  ]  =  0 

0  o' 
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For  consistency,  setting  uq  =  uq,  vq  =  vq,  we  obtain  the  unique  solution  of  this 
set  of  simultaneous  equations  to  be 


u*  =  y  (x  ,x.)  =  (6/61) (3x  -  4x.) 

o  o  o  l  o  i 

v*  =  6*(x  ,x  )  =  (6/61) (3x  -  4x.) 

O  O  O  1  0  1 


(4.6) 


which  are  the  unique  Nash  policies  at  stage  0. 

This,  then,  completes  derivation  in  retrograde  time.  We  now  have  to  sweep 
the  stages  in  forward  time  to  complete  the  expressions  for  the  Nash  policies  at 
other  stages.  Towards  this  end,  we  first  use  (4.6)  in  (4.4)  for  (uo*v0)>  Co 
arrive  at 


U1  =  ^>o’xl>  =  (61)  Hx0  ~  4X.J  I 

t 

v*  =  B*(x  ,x  )  =  [150/ (43) (61) ] [x  -  4x. ]  I 
i  i  o  i  o  1 


(4.7) 


Finally,  using  (4.7)  in  (4.2)  for  (u^.v^)  we  obtain 

u2  =  72(xo’X1’X2)  =  ~  x2  +  (2/2623) [(2023/3)Xl  +  50xq]  ' 

a  *  '  (-1-3) 

v0  =  6_(x  , x . , x_ )  =  -  X.  +  (2/2623)  [(2023/3)x.  +  50x  ]  ;  . 

-  Z  O  1  1  Z  10 

y?  *  *  *  *  *  *  * 

Hence,  y  =  (Y^Y^.Yj)  ;  3  =  (3o,3^,3,),  as  given  by  (4.6)-(4.8)  constitutes 

the  unique  Nash  equilibrium  solution  for  the  dynamic  game  (with  second-order 
state  equation)  formulated  in  this  section.  It  should  be  noted  that  the 
equilibrium  policies  use  complete  memory  —  not  only  the  current  values  of 
the  state.  A  second  observation  to  be  made  is  that  the  unique  solution  for  the 
linear-quadratic  game  is  linear  in  the  available  state  information.  Some 
scrutiny  reveals  that  this  is  in  fact  a  property  that  is  shared  by  all  linear- 
quadratic  dynamics  games  (that  is,  for  games  for  which,  in  the  framework  of 
the  general  formulation  of  Section  2,  f  is  linear,  and  g1,  q1  are  quadratic)  . 

In  other  words,  for  linear-quadratic  dynamic  games  with  second  (or  higneri  ^rder 


dynamics,  and  with  the  additive  noise  satisfying  the  properties  elucidated  in 
Section  2,  the  Nash  equilibrium  solution  will  generically  exist,  will  be  unique 
and  linear,  and  will  depend  not  only  on  the  current  value  of  the  state  but  also  on 
memory.  A  precise  verification  of  this  result  is  notationally  cumbersome,  but 
it  follows  from  an  otherwise  routine  application  of  the  procedure  of  Section  3 
to  linear-quadratic  games. 

5.  Concluding  Remarks 

In  this  paper  we  have  addressed  a  class  of  noncooperative  dynamic  games  whic 
have  not  been  treated  before  and  to  which  the  currently  available  theory  does  not 
apply.  This  class  involves  discrete-time  game  models  with  state  dynamics  of 
second  order,  and  with  additive  noise  disturbance  satisfying  certain  regularity 
conditions.  It  has  been  shown,  through  an  intricate  set  of  arguments,  that  the 
Nash  equilibrium  solution  will  be  informationally  unique,  whenever  it  exists,  and 
can  be  obtained  by  an  iterative  procedure  which  sweeps  the  time  interval  twice. 

In  the  case  of  linear-quadratic  games  and  under  the  closed-loop  information 
pattern  for  all  players,  the  informationally  unique  Nach  solution  is  linear  in 
the  current  and  past  values  of  the  state,  and  can  be  obtained  explicitly. 

An  obvious,  but  challenging,  extension  of  the  theory  presented  here 
would  be  to  continuous-time  problems  with  second  or  higher  order  dynamics  and 
subject  to  additive  noise  with  independent  increments.  It  is  anticipated  that 
the  procedure  developed  here  for  the  discrete  time  problem  would  have  a  natural 
counterpart  in  this  case,  with  the  structural  properties  of  the  equilibrium 


solution  as  presented  here  remaining  intact. 
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Abstract .  In  this  paper  we  first  provide  a  brief  review  of  some  of  the  recent  results 
on  dynamic  games,  in  particular  with  regard  to  memory  strategies,  and  then  discuss 
potential  applications  of  the  techniques  developed  in  this  context  to  large  scale 
systems  design,  optimization  and  coordination.  Incorporation  of  memory  strategies  in 
the  optimization  of  interconnected  systems  enables  one  to  enforce  satisfiability  of 
more  chan  one  criterion  and  to  treat  coordination  problems  wnerein  the  lower  levels 
have  different  perceptions  of  the  underlying  model  and  goals.  A  numoer  of  such  design 
criteria  are  introduced,  and  recipes  are  given  to  obtain  coordinator  policies  with  good 
sensitivity  properties. 

Rewords.  Large  scale  systems,  interconnected  systems,  dynamic  games,  coordination, 
minimum  sensitivitv  analysis. 


1 .  INTRODUCTION 

This  paper  discusses  the  role  of  memory  strategies 
in  the  optimization  and  robustness  considerations 
o:  interconnected  systems  controlled  by  several 
decision  makers  and  under  possibly  different  per¬ 
formance  measures.  it  is  a  known  fact  in  control 
theory  chat  the  use  of  control  laws  which  incor¬ 
porate  also  the  oast  values  of  the  state  vector 
lead  to  better  system  performance  in  terms  of 
robustness,  and  minimum  sensitivity  to  changes  in 
the  nominal  values  of  me  system  parameters  and  to 
external  disturbances .  Even  in  linear  determinis¬ 
tic  *  vs  tens  with  partial*;/  unknown  parameters,  or 
with  parameters  vrv.-n  are  slowly  varying  (drifting;, 
:se  :i  nig.ner  orzer  compensators  'instead  of  pure 
feedback  control  laws*  lead  to  more  acceptable 
;or.t roller  designs  i !<o«cotovic ,  ec  al,  1963; 

S.mzar*i ; an  anc  Cruz.  1971)  with  better  sensitivicv 
-?r -'Dert  ies  in  ca.^e  of  small  deviations  from  the 
nominal lv  adopted  model.  Inclusion  of  also  me 
past  values  o:  toe  state  variables  in  the  control¬ 
ler  lesign  procedure  brings  in  redundancy  in  infor¬ 
mation  -.men,  if  used  j udic ious iv, leads  to  .onsider- 
ible  im.nrover.ent  in  overall  system  performance. 

Parallel  to  the  centralized  case,  incorporac ion  of 
remcrv  in  controller  design  also  gains  paramount 
importance  in  the  optimization,  decentralized 
control  and  coordination  of  interconnected  systems. 
The  objective  of  this  paoer  is  to  discuss  this  role 
::  -.error**  policies  in  nul t i-stut i .»n  multi- 50a! 
ctiaicatior.  and  cncrol  problems.  anc  to  ccnvev 
me  reiuer  tr.e  imp  *r  tint  message  tnat  sens  it  iv  it;* 
analysis  tines  1  natural  place  ar.a  extension  in 
suen  systems.  The  pacer  is  .:  tutorial  nature 
mi  er.nna sizes  more  the  underlying  tone eots  an-; 
memo  ic  .cgv  rot.-.er  man  let .vat i:n  of  scetifi* 
t-Swlts.  -'.vever.  tr.e  reader  c  ulc  u-e  tr.e  rec.Ze- 
*  ■=  r.  m  t  rarer  :  'ctiin  r'ouat,  minim  in  ¬ 


sensitive  control  policies  for  interconnected 
systems  with  specific  structures. 

Crucial  in  this  development,  is  the  role  of  memory 
policies  in  dynamic  games,  particular Lv  with  re¬ 
gard  to  existence  and  uniqueness  of  various  types 
of  equilibrium.  For  this  reason,  the  next  section 
provides  a  brief  description  of  and  a  perspective 
on  dynamic  games,  with  emphasis  on  recent  develop¬ 
ments  and  informational  redundancy.  Section  3 
discusses  wavs  of  utilizing  informational  reaur.- 
dancy  in  ootaining  rooust  coordinator  policies 
with  aopealing  sensitivity  properties  in  the  face 
of  modelling  inaccuracies.  Section  -  incrrcuces 
a  different  type  of  discrepancy  and  evaluates 
memory  oolicies  from  tnat  perspective.  The  racer 
ends  with  tr.e  concluding  remarks  of  Section  5. 


A  GAME -THEORETIC  MODEL  AMD 
SOME  SALIENT  CHARACTERISTIC: 

To  fix  the  ideas  and  to  have  a  m.i:  *ing  :  r  ir.evor.% 
to  work  in,  let  us  consider  an  n-o i.mer.sional  s;stc 
described  in  continuous  time  and  :cn  z  rc  1  l:-c  tv 
three  different  stations: 

x  ■  f «  x ,  u ,  v ,  v ,  a  .1  ;  x  < .  t , )  »  x  ^  ;  t  ■  t  >  .  •  i .  1  > 

.-.ere  u « .* ,  w  ire  the  tor,  t  r  0  *  *ectors  t  station  a  .r 

tame  tne  'r";  •  1,  T  me  t,  r esc e 
.  ues  1 n  _uc*  ide.in  scates  ot 


Iv,  t  i  .*  ..  n  g  va  ; 


jprr  .an.ic-?  c imen^ : ons  ana  sac  isr**ing  some  -mo 


ness  nail  ion 
rurt her mere ,  . 

meter,  renre-^e 
: evm ■ 1 n  t 


,  !ut,  is  -  iecevise  zone  ir.c:  C  in 
same  nossiol*/  *ec t '  r-'-.i  1  te-d  ~  1 
tint  tre  reading  me er taint"  *r* 


I 

i 


i 

I 


ul  t  »  *  •  ,  ,V'C)  *  , 

w;t)  *  .".jCu )  , 

where  we  assume  that  ^  is  sufficiently  smooth  so 
chat  t.2.1)  admits  a  unique  solution  for  each  triple 


Now,  it  the  stations  have  possibly  different  objec¬ 
tive  functions,  say 


J  i (  '  1  ’  ’ :  • '  3 1 

i-1,2,3. 


h  (:<(t->)  +  g.(xU)  ,u(t )  ,vu)  , 

t0  w(C)]dt  (2.2) 


where  tj  denotes  the  terminal  time,  a  relevant  non- 
cooperative  solution  concept  in  this  context  is  the 
so-called  Nash  equilibrium  y*  *  (• * ,y*  » >3 )  which 
satisfies 


JA-  ) 


J.  (y  )  for  all  admissible  y. 


1  -  i  * 

1-1,2, i 

where 

* 

* 

[ 1  ’  0 

U> 

i*l 

•i  ■  .  f'i- 

■V 

1=2 

l  (  T- 

•  * 

1  ->  i 

1  31 

i*3 

(2.3) 


For  the  case  when  all  J^’s  are  the  same  function, 
and-^’s  are  equivalent,  this  problem  becomes  equiva¬ 
lent  to  a  centralized  optimal  control  problem 
‘'with  the  three  stations  considered  as  a  single 
station),  and  (2.3)  represents  in  this  case  a  weak¬ 
er  version  of  the  (team)  optimality  condition. 


One  of  the  important  results  of  dynamic  game  theory 
savg  that  if  the  information  pattern  r  *  (n^, *17,^3) 
incorporates  some  memory  (i.e.,  redundant  state 
information)  for  at  least  one  station,  then  in 
general  the  Nash  solution  v*  is  nonunique  (in  fact 
infinitely  many),  leading  to  infinitely  many  pos¬ 
sible  equilibrium  cost  triples  (J^(.*>,  i»l,2,3) 
whenever  i^’s  are  different  (see  Basar,  1977;  Basar 
and  Olsder,  19821 .  Hence  it  seems  that,  at  least 
at  the  outset,  presence  of  redundancy  in  the  avail¬ 
able  information  leads  to  ambiguity  in  the  solution 
n:  the  game  problem,  making  the  entire  analysis 
totally  worthless.  Fortunately,  however,  this  is 
not  tne  entire  story,  and  the  situation  could  be 
salvaged  bv  introducing  an  element  of  coordination 
into  the  modeL.  In  fact,  it  will  turn  out  that 
this  nonuniqueness  of  equilibria  (so-called  "infor¬ 
mational  nonuniqueness" >  is  1  bLessing  in  disguise, 
wr ion  can  be  used  to  considerable  advantage  if  the 
model  is  set  up  properly. 


memory.  However,  if  the  memory  policy  of  the 
leader  is  chosen  judiciously,  it  could  break  -he 
Nash  game  into  two  independent  (decoupled;  optim¬ 
ization  problems  (one  for  each  station),  thereby 
eliminating  the  informational  nonuniqueness 
associated  with  Nash  equilibria.  This  is  possible 
because  whenever  the  leader’s  policy  incorporates 
memory,  he  is  in  general  able  to  "control"  the 
structure  of  the  cost  functions  faced  by  tne  other 
two  stations  without  incurring  degradation  in  his 
own  performance  [3a$ar  and  Selbuz,  1979;  Ho,  Luh 
and  Muralidharan,  1981 1  . 

(ii)  Utilization  of  an  appropriate  memory  strategy 
by  the  leader  could  also  lead  (together  with  the 
other  two  stations’  optimal  response  policies)  to 
a  Pareto  efficient  solution,  which  could  be  de¬ 
fined,  for  example,  as  the  minimization  of  a  convex 
combination  of  the  two  stations’  cost  functions. 

This  property  has  been  established  in  the  literature 
in  different  frameworks,  when  at  the  lower  level 
there  are  two  stations  [see  Basar >  1980)  or  a  single 
station  (Basar  and  Selbuz,  1979;  Papavassilopoulos 
and  Cruz,  1979;  Tolwinski,  1981). 

Optimum  coordinating  memory  policies  referred  to 
above  are  not  in  general  unique,  even  though  each 
one  leads  to  the  same  performance  from  the  coordi¬ 
nating  station's  point  of  view.  Hence,  the  issue 
here  is  not  that  there  is  still  ambiguity  in  the 
proposed  solution  (or  that  the  problem  with  the 
coordinator  is  ill-posed) ,  but  that  this  nonunique¬ 
ness  brings  in  additional  degrees  of  freedom  which 
could  be  used  to  our  advantage  in  improving  the 
overall  performance  of  the  system  in  case  of 
inaccuracies  in  system  modeling  and/or  goal  per¬ 
ceptions.  The  former  type  of  inaccuracv  is  discus¬ 
sed  in  the  next  section,  and  the  latter  tvpe  is 
discussed  in  Section  •* . 


3.  COORDINATING  POLICIES  WITH  OPTIU171 
SENSITIVITY  PROPERTIES:  .MODELLING 
INACCURACIES 

Let  us  now  go  hack  to  the  zeneral  model  (2.1)  where 
1  is  an  unknown  oaraneter  vector,  with  tne  uncertain¬ 
ty  being  in  a  small  neighborhood  of  a  nominal  value 
t°.  Now,  with  .  fixed  at  i°,  and  known  bv  all 
parties,  let  us  denote  the  class  of  all  optimal 
coordinator  policies  •  ^  bv  “*?  which  will  in  general 
have  infinitely  many  elements  when  policies  are 
allowed  to  depend  also  on  the  past  values  of  the 
state.  Hence,  for  all  the  dynamic  Nash 

game  described  bv  the  cost  functionals  <2.2*: 


Towards  that  end,  let  us  now  endow  one  of  the 
stations,  say  station  1,  with  additional  power  or 
authority,  to  see  that  1  again  in  a  decentralized 
framework)  an  acceotable  set  of  controllers  are 
chosen  by  all  stations,  satisfying,  perhaps,  some 
pre-set  conditions.  With  such  an  asymmetry  in  the 
rcles  of  the  stations  'decision  makers)  we  call 
station  1  the  leader  (or  the  coordinator)  and 
stations  2  and  3  the  followers,  adopting  the  term¬ 
inology  of  Stackelberg  games.  Here,  station  1 
announces  a  policy  f'a  control  law  to  which 

the  .'Cher  stations  respond  by  minimizing  tneir  cost 
functions  J->  ind  J3,  for  stations  2  and  3.  respec¬ 
tive  Lv.  Anticipating  these  responses,  the  leader 
decides  on  1  policv  'which  could  also  be  called  the 
” : cord inat ing  policv")  which  leads  to  m  overall 
>it  *.  s  f  .:o  t  jrv  performance.  Such  a  policy,  provided 
tv  it  it  incorporates  memorv,  :ould  lead  to  a  well- 
p  >sed  *pt  imiz.it  ion  prop  lea  it  the  lover  level  ind 
:1s.  rem.'ve  the  ine:  f  i ;  ien.:  v  ::  Na-ui  equilibria. 

Let  ;s  n  'w  -liberate  on  tnese  -wo  points. 

"or  “  j.:-i  ir-itraril-  inn  our.  od  mono  v  noli;** 

<  . e  1 0  —  r  .  t  tw«-  scut  t  ->  it  t  le  . ;vor  - 

*-  f  iced  with  *  Nus  .  c  .me  wr  ich  :n*il  p  still 
I-**-  !  s*- 1  11. s-  •:  5m*i:u::v  in  *:  v  i'-nimi- 


,I,( ',0,0  J  ,  J  .  1  )  3.1) 

and  state  equation 

x  *  f  ( x  )  ,  v ,  v ,  A  »  ;  x  <  t  )  =  x  , 

11  0  3 .  j ) 

”  -  o 

will  be  well-defined,  jnd  its  solution  will  re  tne 
sane  regardless  of  which  element  is  : no sen  ut  of 
.  Therefore,  as  far  u  the  nominal  model  coes, 

TV  constitutes  the  solution  set  for  the  . cord inator . 
[Note  that  this  discussion  also  pertains  to  the 
case  when  J  >  and  .’  j  are  identical,  t  ha  c  is  when 
we  have  a  team  problem  at  the  lower  level.'  Now, 
we  are  faced  with  the  secondarv  ibut  important) 
problem  of  introducing  in  additional  «pcimalitv 
criterion  on  '  so  as  to  ibs,>rr  tne  freeao-  vm.ro  ii- 
ec  in  tne  v'Lc  'ptima.  nol  iv  ie>  -ut  r 
This  opc  :mal  itv  *r  iterior.  is  introduc-:  f,  . .  vs  : 


Ve  ;ave  earlier  remaned  that  ■  w;!l  .■**.  :o**eral 
taxo  vi lues  in  i  r.eign borve  *d  r  .  .n.u  t 


.•  :*»  t  r  'l  1  -- r  -  -  i- 
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that  the  coordinator  may  broadcast  the  nominal 
value  x  to  che  Lower  level  controllers,  but  because 
of  noise  in  the  transmission  the  actual  value  re¬ 
ceived  by  themjnay  snow  a  small  variation  from  <  , 
say  i  +  la  »  x.  Let  us  now  pursue  this  discussion 
somewhat  further  and  study  the  effect  of  such  a 
deviation  from  the  nominal  model  on  the  nominal 
equilibrium  solution,  when  both  lower  level  con¬ 
trollers  perceive  che  underlying  model  as  described 
bv 

x  -  f (x,T  (-  ) ,v,w,i)  ;  x(c  )  *  x  (3.3) 

Li  0  0 

—  o  —  o 

where  -  *  t  +■  Li,  and  is  an  announced  policy 

of  the  coordinator.  [Mote  chat  here  both  control¬ 
lers  use  che  same  (albeit  different  from  the  nom¬ 
inal)  model.  The  case  when  a  is  different  for  two 
controllers  can  also  be  studied,  but  the  analysis 
is  more  involved.] 


With  the  cost  functions  as  described  by  (3.1),  ana 
svstem  model  by  (3.3),  we  have  a  two-person  Mash 
game  whose  solution  can  be  obtained,  under  the  given 
information  pattern,  using  the  available  theory 
[3a^ar  and  Olsder,  1982],  which  will  in  general  be 
nonunique  if  the  dynamic  information  at  che  lower 
level  involves  memory  and  Jo  is  intrinsically  dif¬ 
ferent  from  J3.  However,  if  7^  is  chosen  judiciously , 
the  lower  level  Mash  game  will  be  well-defined  in¬ 
formationally,  admitting  a  unique  solution  for  each 
adopted  information  pattern,  as  discussed  earlier  in 
Section  2.  Hence,  we  assume  chat  Fi  is  in  fact  such 
a  class  of  policies  for  the  coordinator  (this  is 
indeed  a  rich  class,  in  general),  which  will 
trivially  be  the  case  if  the  lower  level  problem 
is  a  team  (i.e.,  =  .U)  •  The  solution  at  this 

lower  level  will  in  general  depend  on  different 
choices  for  .  and  che  value  t.  To  indicate 

this  implicit  dependence,  let  us_denote  the  optimal 
solution  at  the  lower  level  by  i,u ,  n>)  ,v  3 

<  1  * •*  *'3)  '  *.  •s'hen  this  pair  of  policies  is  sub¬ 
stituted  into  the  state  equacion  (3.2)  for  che 
nominal  model,  and  che  resulting  equation  solved, 
ve  arrive  ac_a  trajectory  which  again  will  vary 
with  .  and  x,  with  the  dependence  on  x  being  oniv 
tnrough  and^.^  Finally,  when  this  trajectory, 
together  with  are  substituted  into  che 

coordinator  ’  s  cost  functional,  the  resul tir.g_expres- 
sion  for  J ^  will  depend  on  t  and  che  choice  out 
of  7. ;  Let  us  denote  this  by 


r  \  ■ 


t  . 


1  1‘ 


'  3 1  ’  i  ■ 


,i3»).  <3.i) 


One  important  propercv  of  F  is  Chat,  when 

i:  is  inceoenaent  of  *  L  ' 

^o 

constant  over  .  ^ . 

when  the  same  nominal  model  is  used  at  all 
every  policy  of  che  coordinator  out  of  " ^  Leads  to 
the  same  performance  and  the  same  trajectory.] 


.  :Hac  is.  HU,-1  '  i3  a 

[Because,  as  indicated  earlier. 

ieve Ls , 


r  .  .  .  -  -O 

since  F  varies  wicn  ^  over  a  natural  question 
to  ask  here  is  "What  is  the  best  choice  out  of  7^ 
which  renders  che  sensitivity  of  optimum  ij_  to 
changes  in  the  value  of  1  from  the  nominal  value 
i'  minimum.’".  Since,  bv  construct  ion  , 


,Vv 


(3.5) 


.  _  o 

'  1  '  *  1  ’ 

this  question  can  be  rephrased  in  mathematical 


n in  F (  ■  ,  ,  t )  ,  1  *=  N  (  1 1 )  f  3 .  n ' 

where  M  '  1)  denotes  1  given  .-neighborhood  of  v  . 
ii nee  .  is  not  fixed,  ("l.nt  is  still  anbi clous, 
ana  therefore,  ve  .aw  to  adopt  either  a  v«rst  mfc 


'UD  ^  F 


"incremental  deviation  son  sit  i**if"  itcrci.t 
is  more  in  lin.e  vit n  the  .rnr'i.h  cc .r  ve 


1  f  '  r  '  i  r.  c  1  e 


c.i".:  'n 


In  this  latter  approacn  we  minimise  a  weighted 
first-order  or  second-order  sensitivity  function 
associated  with  F  and  with  regard  to  the  parameter 
x  around  the  nominal  value  x  .  If  *€  ju  ,  then  the 

first  order  differential  is: 
n 


( d  F  (  y 


i«l 


I*- 


J/dijI 


(1. 


<3 .6) 


Generally  this  expression  will  either  be  zero,  or 
be  nonzero  but  independent  of  - ^ ,  so  chat  we  will 
have  to  consider  the  next  leading  term: 


3^F(y. ,1) 


i-i  i-i 


[d*F(v  ,3)/di.dci.]  o 

I  1  J 

<v 


(3.9) 


which  will  explicitly  depend  on  Then,  a  mean¬ 

ingful  criterion  will  be  the  minimization  of  a 
suitable  norm  of  che  nonnegative  definite  matrix 

•(V  "  td2F(Vi.a)/do.idaJ;1>Jill . n>  *  -  1° 


'  1  ‘  1' 


Let 


<1  *  arg_0  min 


(2.10b) 


■1€  1 


where  U*  N  denotes  a  suitable  matrix  norm.  Feasibil¬ 
ity  of  this  minimization  problem  will,  of  course, 
depend  on  the  general  structures  of  F  and  “°,  which 
will  in  turn  depend  on  the  structures  of  the  cost 
functions  J^.Jo,^,  and  the  function  f  which 
characterizes  the  state  equation;  specific 
results  could  be  obtained  by  assuming  specific 
structures  for  f,  h„-  and  g^  in  (2.1)-f2.2).  What 
is  true  independent  of  what  che  structures  of  . 
these  solutions  are,  however,  is  the  fact  that 
as  defined  by  (3.10b)  serves  two  important  roies" 
as  an  optimum  coordinator  strategy: 

<i)  It  ensures  chat  when  the  nominal  model  is 
adopted  by  all  stations,  a  certain  acceptable  per¬ 
formance  is  attained  at  the  upper  level,  and  the 
optimization  problem  faced  by  the  lower  level 
stations  is  weli-posed  and  even  decoupled. 

(ii)  If  the  lower  level  stations  deviate  rr'm 
the  nominal  model  when  computing  their  optimum 
response  controls,  the  effect  of  this  deviation  v. 
the  performance  at  che  upper  level  is  minimal. 

This,  of  course,  is  all  possible  provided  that  the 
coordinator  is  allowed  to  use  memor"  policies. 


In  the  next  section,  ve  vi;l  observe  a  similar 
effective  role  played  by  -oordinitor  policies  in 
the  context  of  a  different  tyoe  of  system  inaccuracy. 


-i.  COORDINATING  POLICIES  WITH  OPTIMUM 
SENSITIVITY  PROPERTIES: 

INACCURACIES  IN  GOAL  PERCEPTIONS 

Consider  the  problem  formulation  of  Section  2,  but 
with  J 1  -  J 1  :  ! 3  =  J  i nominally;,  and  1  =  1 '  1  fixed 
value  known  bv  ail  stations.  This  is  then,  ba¬ 
sically  an  optimal  control  problem,  and  oncer 
appropriate  convexitv  conditions  the  optimizing 
control  policies  ■  •  1  '  - 1  ) ,  will  he 

unique  is  far  is  their  ooen-ioop  values  go,  but 
nonunique  otherwise  —  in  other  words,  wr.en  tne 
underiving  information  pattern  is  jvnamio,  ve  . ave 
equivalence  : lasses  J  wit:,  the  oror-t-rf  : b.at 


3 


robust* f ving  t he  overall  per rVrmunce  jga ins t  inac- 
curacies  or  discrepanc ies  in  perceptions  at  the 
stations  - —  provided  that  one  station  is  given  a 
superior  role  in  coordinating  the  policies  or  the 
other  stations.  In  the  previous  section  we  have 
allowed  tor  inaccuracies  in  the  modeling  of  the 
state  equation,  and  have  argued  that  it  is  possible 
t^.  find  an  optimally  coordinating  poxiov  which  also 
renders  the  overall  performance  minimally  sensitive 
or  insensitive  to  deviations  in  the  state  model 
from  the  nominal.  Vote  that  if  the  nominal  problem 
is  taken  as  a  team  problem  (with  identical  cosr 
functions  for  all  stations),  derivation  of  minimal¬ 
ly  sensitive  policies  requires  consideration  of  a 
dvnamic  game ,  because  the  inaccuracies  in  modeling 
cannot  be  handled  in  the  framework  of  team  problems. 
Likewise,  when  the  inaccuracy  is  in  the  goal  per¬ 
ception  of  the  low-er-level  stations  (which  will  be 
the  topic  in  this  section),  derivation  of  minimally 
sensitive  policies  asks  for  a  game  theoretic 
analvsis . 


F(-r£  )  *  -J( 

where  the  latter  has  been  defined  by  (4.1),  and 
furthermore 

r(.r£,  •  FC.,.*0)  .  V:  .  .  U  ?» 

Hence,  in  order  to  satisfy  property  <ii)  above, 

has  to  be  chosen  so  as  to  minimize  the  lead 

i  i  j 

ing  terms  in  the  expansion  ot  F<>  ;  ,£.)  -  F(  <  ^  i 
around  £  .  If  ;€  IR~n,  the  first  and  second  order 


;f(v  ,i)  *  :  ! d f ( . ^ , 3 )  ,  d=  1 1  _ 

i*l  - 


.o  ( z  .  -  b  )  ( 4 . 6 '» 


“F(V  Z)  -  :  :  f d^F(>  ,i)/di  dc  ] . 

j*l  i-1  “  - 


Towards  this  end,  let  £  be  a  parameter  with  nominal 
value  /  ,  affecting  directly  only  the  cost  functions 
but  not  the  state  equation;  that  is,  J^*J^ ( •  >  ,*»2 * 
•j.t),  with  the  further  property  that  nominally. 


W: 


respectively.  The  first  of  these  is  in  general 
independent  of  • ^  (as  in  (3.8)),  but  the  second 
one  does  depend  on  '■  ■,  ,  thus  making  the  problem  of 
minimizing  a  suitable  norm  of  the  matrix 


with  J  being  the  team  cost  function  used  in  (4.1). 
Now,  if  the  lower  level  stations  had  the  same 
♦common)  perception  of  the  common  cost  function 
twnicn  corresponds  to  s  *  8°),  every  triple  in 
"lx-'lx*"3  w°,-lJ-d  constitute  an  optimal  solution.  If 
the  lower  Level  stations  have  a  different  perception 
o:  j,  this  discrepancy  being  quantified  in  the 
value  of  •  (say  for  station  2  and  for  station 

3)  ,  tne  optimization  problems  at  the  lower  levels 
r  r  o  o 

do  not  necessarily  admit  solutions  in  T^xTj,  that 


arg  min  ;  j  .£“>  % 


ir^,  min  i  )  2  <4. 2b) 

•  ,  3  j.  -  i  3 

_ o  „o  _o 

vitn  i  ’  « •  ’ «  •  -  •  lx*  2X*  J •  Hence,  a  relevant 

cues cion  nereis  whether  there  exists  a  .**5  7*  with 
the  properties  that 

if  mm  L>  is  inceoendent  of 

*.  i  -  3 

•  ;*i  ;  i.  ;  =  l,o)  for  * 1  in  an  open  neighborhood 


If  ■  t  =  arg  min  J..  ^  ) ,  i-2 , 3 , 


. )  given  by  vh 


when  and 


a  small  neighborhood  of  3  .  In  other  words,  we 
wish  to  find  t.nat  which  protects  the  overall 

system  performance  'and  maintains  oot imalit'* ' 
against  incremental  deviations  in  the  goal  percep¬ 
tions  of  the  lower  level  stations  from  the  nominal 
team  cost  function  (as  quant  if  iec  in  the  value  of  c). 

This  again  asks  for  a  minimum  sensitivity  analysis 
watch  could  be  carried  out  following  the  lines  of 
argument  of  Section  3.  Let  ”,  be  a  subset  of 
vnich  comorises  ‘oord inator  policies  having  prrper- 
tv  i'  above.  For  each  1.,  let 


(r  L )  =  •d‘F(-,1,c)/dnjd£i  = 

—  *  _  — 
over  7  a  meaningful  one.  The  desired  is 

then  tne  argument  of  mini  ^  ) II  ,  which  carries 
with  it  appealing  sensitivity  properties,  as  a 
coordinating  policy  for  station  1. 

Such  an  analysis  has  been  carried  out  for  some 
special  cases  and  has  been  found  to  leau  to  some 
robust  policies  with  simple  structures  'Cunsever  ar. 
Sasar,  1932;  Cansever ,  3asar  and  Cruz.  1983).  The 
special  cases  created  in  both  references  pertain 
to  the  situation  when  tnere  is  only  one  station  at 
the  lower  level  and  the  nominal  problem  is  defined 
in  discrete  time,  with  two  stages.  In  the  former 
tne  cost  functionals  at  the  apper  and  lower  levels 
are  taken  to  be  nominally  different,  whereas  in 
the  latter  reference  they  are  taken  to  be  nominal¬ 
ly  the  same,  which  leads  to  higher  degrees  of  free 
com  in  the  choice  of  leader  policies  and  could 
result  in  total  insensitivity  of  tne  -»veralL  per¬ 
formance  to  deviations  in  tne  goal  percent  ions  of 
the  lower  level. 


In  the  continuous -time 
station  at  the  lower  le 
analysis  will  be  more  c 
basically  follow  the  ge 
Its  feasioilicy  and  com 
will  depend  to  a  great 


It  is  also  possible  to 
the  uncertaintv  is  hoc 
e q ua cion,  as  in  Seccio 
Minimum  sensitivitv  co 
obtained  for  that  .more 
in  principle,  by  folio1 
this  and  the  previous 


level  'as  ado 
complicated , 
general  steps 
.’mpucat  ionai 
:  extent  on  t 
^  * s  ana  their 
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:h  in  the  mod 
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This  paper  has  discussed  : i  o  general  *: 
strategies  in  the  cnoriinati-'n  into 


'Vstems  when  there  is  ietrr-in i st  i 
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The  “'r  mesed  mpr'.ic  :  ha.*  'een  ■  son 
maim**  is  in  i  gzr.o-t  •  ^  ;  ’r amev 

u.ncerta  inn  v  t-'  .  m  :  :*•- 1  l  • 


r. :  tria*  n  t  v 


S'aclonal  Economies . 


Ho,  V.  C.,  ?.  3.  Luh  and  H.  Muralidharan 

(1981).  Information  structure,  8tac*.el- 
berg  games  and  incentive  controiiabil ity . 
IEEE  Trans.  >n  Automatic  Control,  AC -IS, 

-o 9-o 90. 

Hokotoviw,  ?.  V.,  J.  3.  Cruz,  Jr.,  J.  E. 

Heller  and  ?.  Sannuci  (1963).  Synthesis 
oc  opt  lota  llv  sensitive  s vs terns.  Proc 
IEEE.  56,  3,  1318-13:9. 

Papavass  ilopoulos ,  0.  P.  and  J.  B.  Cruz,  Jr. 
(1979).  Nonclassical  control  problems 
and  Stackeloerg  games.  IEEE  Trans,  on 
Automatic  Control.  AC-39,  3.  153-166. 

Sundar.ira  ;  an ,  M.  *nd  J.  3.  Cruz.  Jr.  i  1 9 T  J  ^  . 
Sensitivity  reduction  in  time -varying 
linear  and  nonlinear  systems.  Inc.  J .  of 
Cmtro  1 .  15  ,  5  ,  937-9-*3. 

Iv.  Ivin  ski,  3.  >19G1).  Closed-loop  StaCKelberg 
solution  to  multistage  linear  quadratic 


